Reputation from danwat1234 - Linus Tech Tips

danwat1234 got a reaction from dogwitch in BOINC Pentathlon 2024 5 hours ago

Darn, and some portion of the funds for my BOINC farm came from delivering food to Steeplechase apartments in Vancouver Washington! Oh well.
Somewhere in this thread there was discussion in which 1 of the projects is multi-staged I believe a math-based one. So some BOINCers would get work done from earlier steps and whether or not there is proper load in the project to run it smoothly. Maybe i was dreaming, or you could point me to more info. Sporatic apps will be something else.
Yes i don't have any batch config manipulation. All ~265 computers are remoted in 1 at a time, and most all are on WIFI in 1 house so that always has maintenance. Switching projects nicely requires configuring 'no new work' on all, days in advance to be nice to the project i am taking a break from.

danwat1234 reacted to leadeater in BOINC Pentathlon 2024 5 hours ago

FYI if nobody has already posted SiDock as been removed as the Steeplechase project and a new one will be selected with a shorted event run time.

@danwat1234 RIP there goes our one chance for a first or 2nd place 🙃

danwat1234 reacted to leadeater in BOINC Pentathlon 2024 15 hours ago

Just FYI the Steeplechase project might be getting changed to something else

danwat1234 got a reaction from leadeater in BOINC Pentathlon 2024 21 hours ago

Nice guess I'll pwn that one! Good timing because I think shortly after this time period, Sidock may have a quiet period as it transitions to another batch after the Huge, long "corona_RdRp_v2" currently 84% complete. https://www.sidock.si/sidock/server_status.php

danwat1234 reacted to Egon3 in BOINC Pentathlon 2024 21 hours ago

Hah I had the same problem during the Pentathon last year, ended up occurring while I was at work and lost pretty much a day's worth of work as a result. I've made it a point before every Folding/BOINC event to check for Windows updates on all my systems the day before, then pause updates for 2-3 weeks.

danwat1234 reacted to leadeater in BOINC Pentathlon 2024 21 hours ago

Yes but the point is having to use that is a symptom of the issue, using it at all because you need to is the problem itself, you shouldn't need to. All process lasso is doing is setting the below Affinity Masks statically which isn't actually a good thing most of the time, it is what we want in this instance, but it's also something that can be set correctly and not statically by a process itself. This is exactly why I'm saying it shouldn't be necessary.

If you're not telling Windows to group your processes in to a common NUMA node you're going to have problems. If you also force your processes in to NUMA Node 0 then you'll also have problems.

This is something not even Cinebench does correctly, above 64 threads per NUMA Node which is something I have on some of my servers.

Anyway:

As you can see if a thread, a new one, as spawned on a NUMA Node/Processor core that is already busy doing work and not the other completely idle one then it's not Windows scheduler, someone 100% has done something wrong somewhere. This is actually not how Windows works by default with nothing overriding what it would normally do.

https://empyreal96.github.io/nt-info-depot/CS490_Windows_Internals/08_Scheduling.pdf

And:
https://learn.microsoft.com/en-us/windows/win32/procthread/numa-support
https://learn.microsoft.com/en-us/windows/win32/procthread/multiple-processors

If you aren't touching Affinity Masks and not specifically starting your control thread on Node 0 then there is actually no reason a new process would start on or always run on Node 0 when Node 1 is idle. That means if it is happening then it's not Windows scheduler to blame alone.

Win32 Affinity Masks

Slurm is a cluster job scheduler, it's not something you would use or need to use on a single server level. Both Windows and Linux have the required scheduler flags and it's actually what slurm uses (slurm is Linux only mind you). Any application you run on a system can do what slurm does.

danwat1234 reacted to leadeater in BOINC Pentathlon 2024 21 hours ago

Detecting instruction sets isn't particularly difficult. You're also talking about optimizing the code for the arch it's running on which is actually a different thing entirely from ensuring that your process/application plays nicely with systems when running multiple instances or process of your application. It's just a different aspect entirely to optimizing the code to be able to run fast on XYZ CPU.

Prime95 nor PrimeGrid are going to know that you for whatever reason want to run multiple instances on a system so unless they have catered for that then you easily get in to the situation of not having sufficiently accurate resource allocation. When I was running P95 to get the benchmark figures that was doing it based on how many cores and tasks you want to run at the same time etc so P95 certainly can do it and it's part of that application and it does it correctly from what I observed.

PrimeGrid on the other hand has to work in with what BOINC allows to track jobs and give out points etc. Ideally you'd configure the same parameters on the project website as you do in P95 so your task that gets generated and issued to your system knows it's configured for i.e. 6 threads 2 tasks and you'll get credit the right amount of points based on that and run time. That way the main task is aware of both task processes and can allocated to different NUMA nodes for example. However I have no idea if that is actually possible with BOINC at all.

Since the above is not how it's being done you have to rely on BOINC and Windows scheduling more while also making sure your process you start starting is telling them the right information and trying to reserve the correct things which tells/allows resource allocators to do a better or more correct job. BOINC should be looking at the tasks it's allocated and ensure that it is not overlapping task resource allocation when there is unutilized system resources. I can't think of a situation in BOINC context where if BOINC starts task 1 on NUMA Node 0 that when it does to start task 2 that you'd also want to allocated it to NUMA Node 0, logically if NUMA Node 1 exists then that is the preferred place to allocated it. BOINC tasks/job from what I understand are independent and don't need to talk to each other or share memory which is when you would want them on the same NUMA Node i.e. SQL DB process and application/web process on the same system/OS which would give the highest performance running on the same NUMA Node. ESXi actually has detection for that at the VM level and if it sees two VMs talking to each other a lot it'll allocate them to the same NUMA Node (sometimes you want to disable that, almost never).

This is what slurm can do for example:

https://slurm.schedmd.com/salloc.html

Also: https://slurm.schedmd.com/cpu_management.html
See above. slurm does a lot more intelligence ins allocating resources than BOINC does, by necessity, but you the user have to set the correct parameters when submitting jobs in to the job queue or it'll run poorly, or worse not at all. The other thing you don't want to do is in your application code put in anything that would conflict with the slurm allocator like putting in static thread allocations or if you need to make sure you match that when submitting in to slurm.

I'd say very roughly slurm actually allows you to do a little less work in this regard for the application/process code since you have to define a lot of this during job queue submission but you still have to be careful that you have done your thread allocations correctly in code and not done something bad.

As to my comment about researchers, do remember while they can be for example math experts and know how to get the best out of a CPU to do a particular calculation that doesn't actually mean they understand a lot of other aspects of coding and system design. The same way I understand system design, interaction between NUMA nodes and PCIe devices (GPUs, NICs) that doesn't mean I have the sufficient coding experience and knowledge to do anything with that understanding. I could for example tell you not to utilize more than 2 GPUs per server node even though there are 4 because it's 2 per CPU and NVLink is not being used so you'd get greatly less performance if you tried to use 4.

danwat1234 reacted to porina in BOINC Pentathlon 2024 21 hours ago

I have no idea what slurm is but if I had access to HPC resource, the software does have visibility of the system does it not? We do not necessarily have that case with project level code under BOINC.

If you hang around mersenneforum you'll find the writer of Prime95, the math library of which (gwnum) is used in LLR and does the heavy lifting. And for CUL, LLR2 is used, which is a branch of LLR modified to enable fast checking. gwnum does take into consideration microarchitecture optimisations, such as availability of instructions (e.g. AVX-512), differences in implementations of those instructions (1 vs 2 unit AVX-512, and whatever AMD did in Zen 4), and different cache sizes. The performance critical parts were stated as being done in assembly.

danwat1234 reacted to leadeater in BOINC Pentathlon 2024 21 hours ago

@danwat1234 One of the events chosen this year is SiDock which you are currently running, yayyyyy 😀

danwat1234 got a reaction from leadeater in BOINC Pentathlon 2024 Saturday at 09:57 PM

Thanks for letting me know about the event but dedicated 2 biological projects at this time I think.

danwat1234 reacted to leadeater in BOINC Pentathlon 2024 Saturday at 04:52 PM

@danwat1234 You able to help out?

danwat1234 got a reaction from wONKEyeYEs in F@H and BOINC Badge Request Thread [Last Update: 2024-MAY-03] March 11

Ah thanks much I don't think it shows on mobile however. Android chrome . Middle pick is regular mobile mode.
However it does show on mobile on profile page. Well ty

danwat1234 got a reaction from xander.audio in I leveled up my house water cooling! February 23

You should use a throttled blowiematron fan for that AC adapter directly, no heatsink. Or maybe mount on the ground in concert with a fan because concrete is a decent heat sink.
Edit: [ And possibly get a 2nd adapter and wire in parallel to distribute the load]

As ScottJarriel6761 said on YouTube comment, getting a heatpump water heater would help reduce temperature and humidity!

danwat1234 reacted to madprofessor207 in Folding Community Board February 13

On TechLinked they talked about Zluda, which allows CUDA to run on Intel and AMD GPU's. It was abandoned by AMD, but it is opensource. Could it be used to make AMD Cards better at folding CUDA WU's on AMD cards? Or for Boinc?

danwat1234 got a reaction from mattheginger in BOINC Community Board February 4

----Latest electric bill shows 304KWh used per day (/24 12.66KW average). House members use little power ourselves it's mostly the crunchers. That includes a ~1PB Chia farm maybe 1.3KW. House heat is always off, and it is gas. SiDock #1 Rosetta #10 because of WCG down. Usually WCG gets most of the work because SiDock does not allow much Work Unit caching, and Rosetta usually has no work but coincidentally does now.
Note; i have undervolted all the boxes i can as far as possible to increase efficiency. Skylake-class typically undervolt by 140mV Haswell 70mV

danwat1234 reacted to randy123 in Buying a New Car is Stupid January 30

There is hardly any maintenance on EV's
Stop lying about EV's.

danwat1234 reacted to randy123 in Buying a New Car is Stupid January 30

Stopped reading your post there, lying in your first sentence.
Enjoy typing all that out for me just not to read it if your going to just spread misinformation

danwat1234 reacted to HoldSquat in Folding Community Board January 23

yea you did!

@danwat1234 you were so close to passing me. I finished 1WU for 146k 1 minute before midnight. If it wasn't for that and my Pi you def would have passed me. Great folding everyone and thanks for hosting birthday bash @GOTSpectrum. It was nerdy fun!

danwat1234 got a reaction from WiscoMetro in Buying a New Car is Stupid January 22

326K miles yes miles on my 2013 Chevy Volt. Abusive food delivery miles. Seriously jaw dropping abusive. Electric drive is win! I just bought another 4 cheap just in case i ever need to bring it to active status.
Has a Kolchuga skid plate.

1999 Civic automatic 260K miles
Pic is a bit old, car not quite the same.

danwat1234 reacted to Needfuldoer in Buying a New Car is Stupid January 22

Of course buying a new car is stupid; they don't make the Volt anymore.

Please make a follow-up video about aftermarket car speakers! That poor head unit is being held back by 20 year old Honda speakers.

Nice! The GMT400 C/K was the high water mark in pickup design, in my opinion, and those first gen S10s are like baby GMT400s.

Mini pickups are underappreciated.

danwat1234 reacted to leadeater in Birthday Bash Folding Sprint January 19

danwat1234 reacted to Captainmarino in Birthday Bash Folding Sprint January 19

danwat1234 reacted to Yrahcaz91 in Birthday Bash Folding Sprint January 19

Every time I see a WU finish...

danwat1234 reacted to Captainmarino in Birthday Bash Folding Sprint January 19

I'd like to propose DP be named the official movie of Folding Birthday Bash Folding Sprint 2024!

... I... might need to mention... for some reason... that's Down Periscope.

danwat1234 reacted to HoldSquat in Birthday Bash Folding Sprint January 19

Gotta squeeze out them few extra points

Sign In

danwat1234

Posts

Joined

Last visited

Reputation Activity

My Activity Streams