Jump to content

Camofelix

Member
  • Posts

    114
  • Joined

  • Last visited

Reputation Activity

  1. Agree
    Camofelix reacted to Levent in Should I use windows 11 or Linux   
    This question is like asking what should you wear today. It depends. Each has its own pros and cons, what are you trying to achieve?
  2. Agree
    Camofelix reacted to Sauron in GCC up to 50% SLOWER vs LLVM (Clang) and Intel compilers   
    well that's not great 😆 hopefully it's fixed soon
  3. Informative
    Camofelix got a reaction from Sauron in GCC up to 50% SLOWER vs LLVM (Clang) and Intel compilers   
    Not quite sure what you mean 🤔
    Went further down the rabbit hole, and it's a bug in how Glibc (the GNU C library) and GCC do malloc. 

    Replacing the memory allocation subroutines with TC malloc, JE malloc or HOARD malloc all yielded *massive* uplifts in performance, leading to GCC-12 surpassing ICC and CLANG-11 (when those 2 are using GLIBC malloc)

    I haven't had the time to integrate TC malloc etc. with OneAPI yet, but hope to do so soon. 
  4. Funny
    Camofelix got a reaction from soldier_ph in Asus ROG preparing to launch DDR4 to DDR5 adapters to help deal with Scalpers and RAM shortage   
    Summary
     
    Found floating online in a few small circles of beta/early testers are what look like prototype/samples of ASUS ROG branded DDR4 to DDR5 adapter risers with integrated power and logic circuitry.

    These custom adapters with integrated components are needed because of the difference in architecture on the dimm's themselves. Thankfully Alder Lake sports a memory controller capable of both DDR4 and DDR5 capabilities, and if validated on the ROG boards, this could lead to a transitional adapter for early adopters waiting for DDR5 to mature who also already own top of the line DDR4
    Quotes
     
    My thoughts
    it's an interesting case we find ourselves in. This sort of device would only be possible if Asus had either foreseen this issue, overbuilt their trace signalling beyond even normal spec or both.

    What will be interesting is how with this adapter the highest end Asus boards, in combination with the highest end alder lake processors, will be able to OC the current top of the Line DDR4 dimms for potential OC world records.

    EDIT: For the sake of clarification, I'd like to highlight that the board in the video is a prototype that is intentionally oversized for easier debugging with tools such as an oscilloscope. If this product were to come to market, I would be very surprised to see it be even 1/3 as tall as the prototype shown in the video.
     
    Sources
     
  5. Informative
    Camofelix got a reaction from GoodBytes in Asus ROG preparing to launch DDR4 to DDR5 adapters to help deal with Scalpers and RAM shortage   
    Addressing the size, there's no reason to minimize space on this sort of prototype product. From the PCB, it seems to be Revision R1.00T.

    More relevant is that the larger size makes it easier to attach oscilloscope probes to the pads exposed on the PCB near the ROG logo, making it much easier to debug any issues.

    As for overbuilding mother boards, as you extend traces, and then jump from on discrete material to another (from the main board to the Dimm's pins for example) you get signal bounce back creating noise on the lines amongst other issues.

    LTT's cable testing video illustrates a simplified version of this problem.

    If the initial signal from an overbuilt board is cleaner than that from a lower end board, the odds of success with the higher end board, while not guaranteed, are higher.

    It's the same idea as when an OC MB will only have one memory slot per channel, to avoid bounce back*.



    *Technically reflection issues with 2 dimms per channel depends on T topology vs daisy chain topology, but that's beyond the scope of this post.

     
  6. Informative
    Camofelix got a reaction from Bombastinator in Asus ROG preparing to launch DDR4 to DDR5 adapters to help deal with Scalpers and RAM shortage   
    Addressing the size, there's no reason to minimize space on this sort of prototype product. From the PCB, it seems to be Revision R1.00T.

    More relevant is that the larger size makes it easier to attach oscilloscope probes to the pads exposed on the PCB near the ROG logo, making it much easier to debug any issues.

    As for overbuilding mother boards, as you extend traces, and then jump from on discrete material to another (from the main board to the Dimm's pins for example) you get signal bounce back creating noise on the lines amongst other issues.

    LTT's cable testing video illustrates a simplified version of this problem.

    If the initial signal from an overbuilt board is cleaner than that from a lower end board, the odds of success with the higher end board, while not guaranteed, are higher.

    It's the same idea as when an OC MB will only have one memory slot per channel, to avoid bounce back*.



    *Technically reflection issues with 2 dimms per channel depends on T topology vs daisy chain topology, but that's beyond the scope of this post.

     
  7. Informative
    Camofelix got a reaction from EphraimK in Asus ROG preparing to launch DDR4 to DDR5 adapters to help deal with Scalpers and RAM shortage   
    Summary
     
    Found floating online in a few small circles of beta/early testers are what look like prototype/samples of ASUS ROG branded DDR4 to DDR5 adapter risers with integrated power and logic circuitry.

    These custom adapters with integrated components are needed because of the difference in architecture on the dimm's themselves. Thankfully Alder Lake sports a memory controller capable of both DDR4 and DDR5 capabilities, and if validated on the ROG boards, this could lead to a transitional adapter for early adopters waiting for DDR5 to mature who also already own top of the line DDR4
    Quotes
     
    My thoughts
    it's an interesting case we find ourselves in. This sort of device would only be possible if Asus had either foreseen this issue, overbuilt their trace signalling beyond even normal spec or both.

    What will be interesting is how with this adapter the highest end Asus boards, in combination with the highest end alder lake processors, will be able to OC the current top of the Line DDR4 dimms for potential OC world records.

    EDIT: For the sake of clarification, I'd like to highlight that the board in the video is a prototype that is intentionally oversized for easier debugging with tools such as an oscilloscope. If this product were to come to market, I would be very surprised to see it be even 1/3 as tall as the prototype shown in the video.
     
    Sources
     
  8. Informative
    Camofelix got a reaction from AbydosOne in Asus ROG preparing to launch DDR4 to DDR5 adapters to help deal with Scalpers and RAM shortage   
    Summary
     
    Found floating online in a few small circles of beta/early testers are what look like prototype/samples of ASUS ROG branded DDR4 to DDR5 adapter risers with integrated power and logic circuitry.

    These custom adapters with integrated components are needed because of the difference in architecture on the dimm's themselves. Thankfully Alder Lake sports a memory controller capable of both DDR4 and DDR5 capabilities, and if validated on the ROG boards, this could lead to a transitional adapter for early adopters waiting for DDR5 to mature who also already own top of the line DDR4
    Quotes
     
    My thoughts
    it's an interesting case we find ourselves in. This sort of device would only be possible if Asus had either foreseen this issue, overbuilt their trace signalling beyond even normal spec or both.

    What will be interesting is how with this adapter the highest end Asus boards, in combination with the highest end alder lake processors, will be able to OC the current top of the Line DDR4 dimms for potential OC world records.

    EDIT: For the sake of clarification, I'd like to highlight that the board in the video is a prototype that is intentionally oversized for easier debugging with tools such as an oscilloscope. If this product were to come to market, I would be very surprised to see it be even 1/3 as tall as the prototype shown in the video.
     
    Sources
     
  9. Like
    Camofelix got a reaction from Dogzilla07 in Asus ROG preparing to launch DDR4 to DDR5 adapters to help deal with Scalpers and RAM shortage   
    Summary
     
    Found floating online in a few small circles of beta/early testers are what look like prototype/samples of ASUS ROG branded DDR4 to DDR5 adapter risers with integrated power and logic circuitry.

    These custom adapters with integrated components are needed because of the difference in architecture on the dimm's themselves. Thankfully Alder Lake sports a memory controller capable of both DDR4 and DDR5 capabilities, and if validated on the ROG boards, this could lead to a transitional adapter for early adopters waiting for DDR5 to mature who also already own top of the line DDR4
    Quotes
     
    My thoughts
    it's an interesting case we find ourselves in. This sort of device would only be possible if Asus had either foreseen this issue, overbuilt their trace signalling beyond even normal spec or both.

    What will be interesting is how with this adapter the highest end Asus boards, in combination with the highest end alder lake processors, will be able to OC the current top of the Line DDR4 dimms for potential OC world records.

    EDIT: For the sake of clarification, I'd like to highlight that the board in the video is a prototype that is intentionally oversized for easier debugging with tools such as an oscilloscope. If this product were to come to market, I would be very surprised to see it be even 1/3 as tall as the prototype shown in the video.
     
    Sources
     
  10. Informative
    Camofelix got a reaction from Sauron in GCC up to 50% SLOWER vs LLVM (Clang) and Intel compilers   
    Turns out it isn't alder lake at all. It's pervasive as far back as nehalem on all Gcc versions.

    There didnt seem to be a lot of interest on the LTT forums, so I stopped updating this thread, but the main L1T thread has much more info:

    I've been tracking this more on the Level1 techs forums (https://forum.level1techs.com/t/wip-testing-update-its-not-just-alder-lake-it-goes-back-to-nehalem-gcc-50-performance-regressions-vs-clang-and-intel-compilers-in-specific-workloads-across-all-opt-settings/179712/10)


    I've dug through a lot of the assembly, but haven't gone *all the way down* the rabbit hole as it were. (If you count 100+ different runs as not going all the way down I guess 😂)

    It seem's GCC is trying to pre-cache instructions a lot, almost n64 instruction cache style, using wayyyyyyyyyyyyyyyyyyyyy more registers at times and wasting cycles.

     
  11. Informative
    Camofelix got a reaction from Biohazard777 in GCC up to 50% SLOWER vs LLVM (Clang) and Intel compilers   
    TLDR; Programs compiled with GCC (version 7-12) are taking up to 50% longer to complete vs those compiled by open rouce LLVM based CLang (11-13) and Intel ICX,  and closed source Intel ICC compilers. 

    Hi Ladies and Gents,

    I've been working on profiling the ins and outs of how alder lake works with various workloads in various environments as a way of previewing how Sapphire Rapids, which utilizes the same Golden Cove core, will perform in HPC tasks.
     
    To that end, I've already published a few hundred results on twitter in different scenarios with different kernels, compilers memory sub timings etc. those can be found here:
    External Link
     
    Of interest for today however is this test of Binary trees:

    gcc numbers 7 8 9 10 11 12 time taken is 379.943716 time taken is 395.665537 time taken is 373.488119 time taken is 392.596422 time taken is 382.825910 time taken is 390.466340 clang numbers 11 12 13 time taken is 256.381165 time taken is 290.616438 time taken is 284.877824 intel numbers icc icx time taken is 249.630150 time taken is 250.511041 Above tests were completed with Tree size of 26 The above was the output after running the test 20 times, and results were within run to run of +/- 0.2%

    Git with the test can be found here: https://github.com/FCLC/Choosing-a-compiler-performance-testing-GCC_ICC_ICPX_NVCC_CLANG_HIP/tree/main/Binary_tree


    Would love to see results from anyone else and their thoughts
  12. Like
    Camofelix reacted to Sauron in GCC up to 50% SLOWER vs LLVM (Clang) and Intel compilers   
    Interesting, I wonder how much of this is due to alder lake being new and possibly not fully optimized for in gcc... might try this on sandy later.
  13. Informative
    Camofelix reacted to Bombastinator in First full DIY in nearly a decade-need modern case recommendation   
    CDs are basically gone as well.  Things have gone to online downloads.  There isn’t a whole lot of use for 5.25” bays anymore and a lot less use for 3.5” especially externally accessible.
     
    i think enterprise might still have drive cages because SSDs have issues with wear in heavy use situations.  Sometimes HDDs are better. 
  14. Like
    Camofelix got a reaction from Lightwreather in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    older thread, so apologies for the bump everyone.

    Wanted to let people know, I've since gotten my hands on the 12700k and begun testing AVX 512 performance in different HPC applications.

    some quick data can be found here https://openbenchmarking.org/result/2112040-TJ-2111077TJ72&hgv=i7-12700K+P-Cores+%2B+AVX-512+DDR4&ppt=D

     
     
    As of now I'm working on testing for how different cache size scales per core as a way of previewing the Golden Cove core's that will be in Sapphire rapids (same basic topology and by disabling a given amount of cores we can approximate a given amount of shared L3 per core)

    only thing that this doesn't allow to test for is the new AMX instructions directly, but they seem to have an AVX backup possibility (at a reciprocal performance hit)

    feel free to search for #avx512 on twitter and you should be able to find any experiments myself or other collaborators work on!

    An interesting part is that, the i7 in AVX 512 mode can obliterate the i9 in all core mode for AVX-512 workloads like CFD and other engineering workloads
  15. Like
    Camofelix got a reaction from igormp in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    older thread, so apologies for the bump everyone.

    Wanted to let people know, I've since gotten my hands on the 12700k and begun testing AVX 512 performance in different HPC applications.

    some quick data can be found here https://openbenchmarking.org/result/2112040-TJ-2111077TJ72&hgv=i7-12700K+P-Cores+%2B+AVX-512+DDR4&ppt=D

     
     
    As of now I'm working on testing for how different cache size scales per core as a way of previewing the Golden Cove core's that will be in Sapphire rapids (same basic topology and by disabling a given amount of cores we can approximate a given amount of shared L3 per core)

    only thing that this doesn't allow to test for is the new AMX instructions directly, but they seem to have an AVX backup possibility (at a reciprocal performance hit)

    feel free to search for #avx512 on twitter and you should be able to find any experiments myself or other collaborators work on!

    An interesting part is that, the i7 in AVX 512 mode can obliterate the i9 in all core mode for AVX-512 workloads like CFD and other engineering workloads
  16. Informative
    Camofelix got a reaction from leadeater in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    First patch from intel acknowledging AVX-512 on alder lake, but marking it as unsupported. 
    Phoronix article here: https://phoronix.com/scan.php?page=news_item&px=Intel-Alder-Lake-Tuning-GCC
    actual code for the compiler here: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583958.html

     
  17. Informative
    Camofelix got a reaction from Lightwreather in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    First patch from intel acknowledging AVX-512 on alder lake, but marking it as unsupported. 
    Phoronix article here: https://phoronix.com/scan.php?page=news_item&px=Intel-Alder-Lake-Tuning-GCC
    actual code for the compiler here: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583958.html

     
  18. Informative
    Camofelix got a reaction from MageTank in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    Same on my end. It's google user content for the URL, probably a permission issue.
     
     
     
     
  19. Agree
    Camofelix got a reaction from Lightwreather in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    Same on my end. It's google user content for the URL, probably a permission issue.
     
     
     
     
  20. Agree
    Camofelix reacted to igormp in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    Doesn't work for me either, so goblins it is.
  21. Agree
    Camofelix reacted to LAwLz in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    1) Chances are you will be fine with a mid range chip for 5-6 years as well, so you buying more performance than you need today will most likely not translate to your PC lasting longer without an upgrade.
    2) Buying a really expensive high end computer and keeping it for let's say 10 years often ends up costing more than two mid-range computers over the same 10 years, and you often get higher performance in the end as well.
     
    Stop trying to "future proof" PCs by overpaying for them today. In 99,9% of cases it's a waste of money.
     
     
     
    I keep hearing people say DDR5 is expensive but I don't really see it. If you are only looking at capacity then yes, DDR5 is way more expensive than DDR4. But that's like saying "DDR4 from Crucial sure is expensive. 32GB of 4400MHz DDR4 from Crucial is 405 dollars. This 2400MHz kit from G.Skill is only 95 dollars for the same capacity".
     
    Capacity is only one out of several factors when looking at RAM.
    If you are someone who only cares about capacity and don't want high speed RAM then sure, DDR5 sucks for you. But it doesn't suck because DDR5 is inherently expensive. It sucks because you are basically forced to buy "high end DDR4".
    It's kind of like saying SSDs are expensive because a 256GB SSD might cost as much as a 1TB HDD. Saying that "SSDs are 4 times as expensive as HDDs" doesn't mean much since we are ignoring the speed benefit.
     
    Also, Alder Lake can work with DDR4, so it's not like you must buy DDR5. Judging by some benchmarks I've seen you won't really lose much performance anyway. It seems like Alder Lake with DDR4 is the best option for most buyers looking for a new CPU. Unless they can find some Ryzen chip on sale for way below MSRP.
     
     
    And as for power consumption, it seems like only the i9 uses ridiculous amounts of power. The i5 and i7 seem fine and is roughly the same as the AMD R5 and R7 for most common workloads.
  22. Funny
    Camofelix reacted to LAwLz in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    I'm glad that Linus addressed all the AMD fanboys in the latest WAN-show as well.
    It is so painfully obvious when someone is a fanboy as soon as they start talking about how it "isn't fair" to use DDR5 for Intel and DDR4 for AMD.
     
    I guess the next mental gymnastic AMD fanboys will pull is "just wait for Zen 4", because that's the typical go-to argument for fanboys when they have been beat:
    -A fanboy
     
    It's like reading fanfic.
  23. Agree
    Camofelix reacted to leadeater in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    Yep I remember that, it was my understanding that it was going to be completely disabled, to the point of also being so in the microcode and impossible to enable. I don't know where things got missed/mixed in information flow or if that is actually mistake and it's actually not supposed to be possible, who knows at this point and likely never will.
     
    Worst case a new stepping is released that actually does outright disabled it, which in that case the old stepping CPU value will go up (no I'm not saying investment buy the bloody things).
  24. Agree
    Camofelix got a reaction from leadeater in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    Follow up on AVX-512 support:

    here's a post from Der8auer showing how to turn it on for Asus boards:
     
    Please note The part about Anandtech calling it a leak and being wrong etc. are flat out misleading/mischaracterizations of the facts.
     
    See this twitter thread for context:
     
  25. Informative
    Camofelix got a reaction from thechinchinsong in Intel 12th Gen Core Alder Lake for Desktops: Top SKUs Only, Coming November 4th +Z690 Chipset   
    Yup, but where Zen struggles is that due to the lack of 512, it can't pre-pack items like the initial portions of CFD solvers before offloading them to an accelerator.
     
    (Warning: CFD nerd time:)
     
    Specifically important is the decrease in number of clocks required to process AVX-512 in processor in combination with both the larger window and the the wider decoder. I don't have the paper in front of me ATM, but IIRC just to compute the total prepack before sending it off to a GPU was 3-4x slower with AVX-2 vs AVX-512 going as far back as skylake.
    This linear stage typically takes up about 30% of the time of each time step per simulation while things like the actual KSP solvers we're taking ~60-65%, but were able to be incredibly parallelize to the point were Amdahls law was starting to kick our butt's.
     
    Joys of FOSS, since my base stack is OpenMP, UCX, PETSC and OpenFOAM
     
     
    All to say, the very specific changes in architecture that SR brings to AVX-512 looks very promising for CFD workloads, and being able to have some of that in a home system under my desk will be superb for rapid iteration on code.
     
     
     
     
×