Jump to content

I almost cried... - Fixing the $40,000 PC Part 2

nicklmg
1 hour ago, Sakai 121 said:

Agree, i would prefer "Troubleshooting 6 editor 1cpu - part2 ". I seriously dont know clickbait these days....."I almost cried", how would a random dude click on a video showing up on recommendation hoping to see another dude crying but to found out that its a tech troubleshooting and get addicted and hence.became a subscriber. Wouldn't it be more sensible that a random dude like the concept of tech troubleshooting and saw a video called "Toubleshooting 6editor 1 cpu-part2" and click on it, found it interesting and became a subscriber?

It is 2 CPUs.

Link to comment
Share on other sites

Link to post
Share on other sites

The problem I've had with this series is the haphazard why he's approaching this. I think it's a great experiment, and if I had the hardware I would try it too. However his troubleshooting steps don't make sense. Start with one GPU and one VM. Get it working then add another. Also, don't try porting over other VMs. Clean install each one. That's not necessary, but is optimal. This is basic troubleshooting and project management. You don't throw 6 GPUs of all different types in at once and then try and troubleshoot the abundance of errors that could take place. Setting all the hardware out in the open and stacked on top of each other is the icing on the cake. 

I really like watching troubleshooting videos. I do computer repair and system management for a living and I enjoy seeing solutions other people come up with so I can have them in my back pocket just in case. This has been maddening though. It makes for a confusing video and ruins the excitement of the project itself. I just about screamed "FINALLY" into the void when he said it was time to go "back to basics" and install everything "part by part". 

Linus, I doubt you'll see this, but for my sanity please review troubleshooting methods that help efficiently eliminate variables. It will make your life easier and might even make for a good tech quickie video at some point.

CompTIA A+ Certified

 

"We are all cups, quietly and constantly being filled. The trick is knowing how to tip yourself and let the good things pour out." - Ray Bradbury

Link to comment
Share on other sites

Link to post
Share on other sites

That looks like damaged hardware to me. You might want to considering being at least a bit more hesitant to touch electrical contacts. If you're grounded you should be okay, but you never know. The best policy is to avoid touching them at all when it can be avoided. I only say this, because I'm wondering if a component is "partially" fried. Might explain the erratic behavior. Works great, until a particular logic path is used, then everything goes haywire. This kind of inconsistent behavior can definitely be indicative of such a problem.

Certifications: A+, Security+, CCNA Routing & Switching

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, RossMadness said:

The problem I've had with this series is the haphazard why he's approaching this. I think it's a great experiment, and if I had the hardware I would try it too. However his troubleshooting steps don't make sense. Start with one GPU and one VM. Get it working then add another. Also, don't try porting over other VMs. Clean install each one. That's not necessary, but is optimal. This is basic troubleshooting and project management. You don't throw 6 GPUs of all different types in at once and then try and troubleshoot the abundance of errors that could take place. Setting all the hardware out in the open and stacked on top of each other is the icing on the cake. 

I really like watching troubleshooting videos. I do computer repair and system management for a living and I enjoy seeing solutions other people come up with so I can have them in my back pocket just in case. This has been maddening though. It makes for a confusing video and ruins the excitement of the project itself. I just about screamed "FINALLY" into the void when he said it was time to go "back to basics" and install everything "part by part". 

Linus, I doubt you'll see this, but for my sanity please review troubleshooting methods that help efficiently eliminate variables. It will make your life easier and might even make for a good tech quickie video at some point.

Very much agreed. He could have saved hours of time by using a simple, methodical approach, rather than this somewhat nutty approach he tends to prefer. The methodical approach is usually fast. His approach will either work fast, or can take days.

Certifications: A+, Security+, CCNA Routing & Switching

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, GDRRiley said:

your almost crying over this?

I'd like already be if my 40K+ PC was acting like this.

@GDRRiley Nvidia and Intel probably sponsored most of it and selling the lambo should cover it, I guess..

16 hours ago, Luscious said:

Throwing together stuff to try to do what it wasn't originally designed for is going to end up being either bleeding edge or stupid.

 

The difference is that you don't get cut doing stupid.

 

These Frankenstein tech experiments are a recipe for killing expensive hardware. 24/7 up-time scenarios and especially mission critical applications depend on reliable off-the-shelf solutions that have been designed to do a specific task. There's a reason why companies like Tyan and Supermicro specifically sell through outfits such as Thinkmate. No reputable systems administrator on the back end is going to "push the boundaries" with server grade components and then explain to his boss how that $40K box in the company's rack unexpectedly died and needs replacing - that guy would be out of a job TOMORROW GUARANTEED.

 

Doing it with free hardware? I would say there's a limit to how many $1000+ items you can kill before the manufacturer stops sending you those items. Doing it in exchange for ad revenue and views on YouTube? I would say there are more entertaining and cheaper ways to get that.

 

Next video - Overclock your heatsink with a leaf blower!!!

[Sponsored by Juan Deere]

Not just for views but I think some inner child deep inside Linus just loves to mess around with this stuff and inevitably drop it. He has done all the Frankenstein tech experiments possible on cheapo hardware so now he has moved to 10000$ CPUs and GPUs.

Link to comment
Share on other sites

Link to post
Share on other sites

I am kind of confused. What has just happened? Did linux just gave it up?

 

What software was used here? Is it some kind of hypervisor? I do not understand.

Link to comment
Share on other sites

Link to post
Share on other sites

With each video (and these are very cool videos btw., I really like seeing Linus genuinely excited about this) I'm more and more convinced that if he somehow makes it work, it will not serve any real world purpose apart from being a proof of concept.

 

So many problems during setup make me think that troubleshooting this on daily basis might be necessary and a massive PITA.

CPU: i7 6950X  |  Motherboard: Asus Rampage V ed. 10  |  RAM: 32 GB Corsair Dominator Platinum Special Edition 3200 MHz (CL14)  |  GPUs: 2x Asus GTX 1080ti SLI 

Storage: Samsung 960 EVO 1 TB M.2 NVME  |  PSU: In Win SIV 1065W 

Cooling: Custom LC 2 x 360mm EK Radiators | EK D5 Pump | EK 250 Reservoir | EK RVE10 Monoblock | EK GPU Blocks & Backplates | Alphacool Fittings & Connectors | Alphacool Glass Tubing

Case: In Win Tou 2.0  |  Display: Alienware AW3418DW  |  Sound: Woo Audio WA8 Eclipse + Focal Utopia Headphones

Link to comment
Share on other sites

Link to post
Share on other sites

As many have pointed out, I would troubleshoot the hardware with a simpler setup. Because, as of now, we have simply no clue as to where to problem comes from.

 

I think I saw Linus using a single socket motherboard. So, I would make a single CPU, single GPU setup work first. Then add up to four video cards in the setup. then swap the CPU. Then swap the RAM. Then validate the remainder of the video cards. If everything works until that point, well, you should only be left with three possibilities:

1. The dual socket motherboard is defective.

2. The software doesn't fully support your hardware, which doesn't help much.

3. You are facing a hardware compatibility issue, also doesn't help much.

 

Are you still using the Beta BIOS from your PCIe hot swap video? if so, could you try switching to a release BIOS just to make sure some of your issues aren't coming from that BIOS (if that is indeed a Beta BIOS)?

Link to comment
Share on other sites

Link to post
Share on other sites

Hello all,

im also working on a setup with unraid with 4 GPU and got the same issues.

 

I got around by deleting the "VM" in unraid without deleting the virtual Hardrive and make a new one with just adding the existing HDD.

It seems Unraid is confused when a device is removed that is still in the config of the VM and then it causes nonsense.

The other way is to remove the pcie devices in the vm setting BEFORE removing them

 

 

I hope Linus will read this or a team member can tell him this.

 

PS: Its a Z10PED16_WS board with 2 times Xeon-e5 2696 V3

 

 

Regards

Bengele

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Bengele said:

I hope Linus will read this or a team member can tell him this. 

I am sure it will get around to him. Perhaps he will make an auxiliary video, to update the older video, then have a thumbnail picture of Linus wiping away a tear with a Kleenex, then have his facial expression blurred then a clear image of a happy Linus.

Link to comment
Share on other sites

Link to post
Share on other sites

Does anyone know if he is using the 'iommu=pt' parameter? If not, it could help improve performance on the host (i.e. the 10G ethernet card and the octane drive).

 
Quote

 

Describe the issue:

When preparing RHEL for PCI device assignment (for a KVM VM), IOMMU feature is required.  Currently, the document instructs to use "iommu-on".  This does work, but it can have significant performance side affects.  "iommu=on" enables IOMMU for all devices, even if they are not used for device assignment by KVM.  When using IOMMU for devices in the host, extra overhead is introduced for each DMA operation.  This can reduce performance of some IO adapters like 10Gb networking significantly.

Suggestions for improvement:

The most appropriate option to use for device assignment is "iommu=pt".  The enables the IOMMU only for device assignment, which is exactly what is needed.


 

https://bugzilla.redhat.com/show_bug.cgi?id=1201503

Link to comment
Share on other sites

Link to post
Share on other sites

Linus should work step by step .

1 GPU, SSD ,RAM  and even CPU at a time .

In Part 3 (if it comes out)

Linus should work slowly .

then I am sure everything will be fixed.

In this way a PC can be Troubleshooted

And Each VM works as it is supposed to be.

Please quote or tag me @Void Master,so i can see your reply.

 

Everyone was a noob at the beginning, don't be discouraged by toxic trolls even if u lose 15 times in a row. Keep training and pushing yourself further and further, so u can show those sorry lots how it's done !

Be a supportive player, and make sure to reflect a good image of the game community you are a part of. 

Don't kick a player unless they willingly want to ruin your experience.

We are the gamer community, we should take care of each other !

Link to comment
Share on other sites

Link to post
Share on other sites

  • 4 weeks later...

Would that motherboard and CPUs even support that many PCIe lanes?

I just learned this when I was looking at processors that I was thinking about. I have an i7-6850k and x2 1080 and I was wanting to get an i9-9900k; the 6850k supports up to 40 PCIe lanes and the 9900k only supports 16 PCIe lanes.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...

Solution?: Remove the graphics cards that are not identical to the others and replace them so that all are identical. Then replace the power supply with a unit of larger capacity (you know how devices act funny when they're under powered?). Now start over, adding cards one at a time.

 

If you bump into another problem, it's probably the motherboard, or the CPUs (at which point, you could try a blog site that's had similar problems..).

 

You could try changing brands of motherboard? ...

 

If you HAVE to, replace EVERYTHING (starting with the most obvious, one piece at a time...). If, after everything is replaced, your project STILL doesn't work, then it's a flop. Send everything back, or whatever.

 

JD

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×