R720 Tesla Card

TubsAlwaysWins · October 23, 2024

Looking at picking up a Tesla card for my Poweredge R720 and had a couple questions.

Do I need dual 1100W PSUs to run the Tesla Cards?
Can I get the Tesla's to offer GPU acceleration to a VM when using something like RDP? (Plan to drive multiple monitors, not super familiar with how RDP works in terms of that)
What Tesla Card offers the best price to performance? Looking at the K80/M40/M60/P40/P100 cards.
Is it worth upgrading to a Tesla card over a 1050TI? Ive heard they play nicer with VMs. I currently have my 1050Ti In passthrough to one of my Windows VM's.

Here are the current specs:

2x Xeon E5-2660 v2 - 20c / 40t combined
128GB RAM
GTX 1050ti
2x 750W PSUs
VMWare ESXi 6.7u3

Thoughts?

Im just thinking about it for now. Not committed to it yet.

Needfuldoer · October 23, 2024

If you just want to play games on one VM, stick with the card you already have. Those cards are all 8 to 10 years old, and they're not even good for doing any AI stuff because they all lack Tensor cores.

Jeff from Craft Computing has a bunch of videos about all the hoops he has to jump through to make these cards game.

6 minutes ago, TubsAlwaysWins said:

Do I need dual 1100W PSUs to run the Tesla Cards?

You'll need at least one. You're looking at 250 watt GPUs on top of your dual 95 watt processors.

You will also need to get the right PCIe riser board with a single x16 slot and a power connector. I believe the part number is CPVNF.

I've fit dual OEM RTX 3060s into an R730, which is a very similar chassis. They got a little warm because they're not designed for flow-through cooling, but they work. Their only shortcoming is that they don't officially support chopping them up into slices for multiple VMs.

jaslion · October 23, 2024

38 minutes ago, Needfuldoer said:

for doing any AI stuff because they all lack Tensor cores.

That and their gaming performance is low. The p100 here is basically a rx580. When it works which is not always.

It also doesnr even support all the current cuda features anymore due to age.

They seem to go for 200$-250$ you can get MUCH better used for that. Hell a 1080ti is achievable. Else well if you REALLY need ai acceleration the 2060 12gb is available for the price and the 3060 12gb can also be found used for that.

TubsAlwaysWins · October 24, 2024

On 10/23/2024 at 9:24 AM, Needfuldoer said:

If you just want to play games on one VM, stick with the card you already have. Those cards are all 8 to 10 years old, and they're not even good for doing any AI stuff because they all lack Tensor cores.

Jeff from Craft Computing has a bunch of videos about all the hoops he has to jump through to make these cards game.

You'll need at least one. You're looking at 250 watt GPUs on top of your dual 95 watt processors.

You will also need to get the right PCIe riser board with a single x16 slot and a power connector. I believe the part number is CPVNF.

I've fit dual OEM RTX 3060s into an R730, which is a very similar chassis. They got a little warm because they're not designed for flow-through cooling, but they work. Their only shortcoming is that they don't officially support chopping them up into slices for multiple VMs.

Not looking for gaming just a good workstation card. I believe I already have the PCIe Risers I need, I just need to purchase the power cables.

TubsAlwaysWins · October 24, 2024

On 10/23/2024 at 10:07 AM, jaslion said:

That and their gaming performance is low. The p100 here is basically a rx580. When it works which is not always.

It also doesnr even support all the current cuda features anymore due to age.

They seem to go for 200$-250$ you can get MUCH better used for that. Hell a 1080ti is achievable. Else well if you REALLY need ai acceleration the 2060 12gb is available for the price and the 3060 12gb can also be found used for that.

Alright Ill look into maybe a 1080 card or something. I think a roomate has a 2060 I can use.

Needfuldoer · October 24, 2024

Just now, TubsAlwaysWins said:

Alright Ill look into maybe a 1080 card or something. I think a roomate has a 2060 I can use.

Your biggest limitation will be cooler size. You can't go much bigger than a two-slot reference card because the heatsink faces "down" toward the motherboard, and you're height-limited by the two cards being next to each other. (And if you have an R720XD with rear 2.5" bays, they'll eat into one slot's available space.) That's the reason I went with OEM 3060s (that and the heatsnk fins point front-to-back instead of up-and-down).

You should just need the GPU power cable, part number 9H6FV, to feed any consumer video card that takes up to 6 + (6+2) PCIe power.

TubsAlwaysWins · October 24, 2024

16 minutes ago, Needfuldoer said:

Your biggest limitation will be cooler size. You can't go much bigger than a two-slot reference card because the heatsink faces "down" toward the motherboard, and you're height-limited by the two cards being next to each other. (And if you have an R720XD with rear 2.5" bays, they'll eat into one slot's available space.) That's the reason I went with OEM 3060s (that and the heatsnk fins point front-to-back instead of up-and-down).

You should just need the GPU power cable, part number 9H6FV, to feed any consumer video card that takes up to 6 + (6+2) PCIe power.

Yeah no chance a 3 slot card is fitting. I do not have the XD variant so I can slap 2 2 slot GPUs in if I wanted.

The reason I was looking at Teslas is mainly because their cooling setup works well in this chassis.

Thanks for the part number!

digitalscream · October 24, 2024

On 10/23/2024 at 4:07 PM, jaslion said:

That and their gaming performance is low. The p100 here is basically a rx580. When it works which is not always.

It also doesnr even support all the current cuda features anymore due to age.

They seem to go for 200$-250$ you can get MUCH better used for that. Hell a 1080ti is achievable. Else well if you REALLY need ai acceleration the 2060 12gb is available for the price and the 3060 12gb can also be found used for that.

Well, the P100 is effectively a 1080 with much faster HBM2 RAM, and 12Gb or 16GB of it. I have two of them here, and they're actually significantly faster than a 1080 when running LLMs - mainly because memory bandwidth one of the biggest indicators of LLM performance.

Obviously you're never going to get H100-like performance out of them, but I've had 30+ tokens/s when running 20GB GGUFs across both cards - that's totally usable, and something you couldn't do with anything short of a 3090.

I don't know what the prices are like over in the US on the second hand market, but I bought a pair of P100s for £250, and a 3090 is over £600 here.

They're also nicely handy in server chassis, because they take EPS 8-pin power rather than PCIE 6+2.

I'm not saying they're the ideal solution - I mean, they'll generally pull about 210W each when running inference - but they're not the crazy useless solution a lot of folk make them out to be either.

TubsAlwaysWins · October 25, 2024

17 hours ago, digitalscream said:

Well, the P100 is effectively a 1080 with much faster HBM2 RAM, and 12Gb or 16GB of it. I have two of them here, and they're actually significantly faster than a 1080 when running LLMs - mainly because memory bandwidth one of the biggest indicators of LLM performance.

Obviously you're never going to get H100-like performance out of them, but I've had 30+ tokens/s when running 20GB GGUFs across both cards - that's totally usable, and something you couldn't do with anything short of a 3090.

I don't know what the prices are like over in the US on the second hand market, but I bought a pair of P100s for £250, and a 3090 is over £600 here.

They're also nicely handy in server chassis, because they take EPS 8-pin power rather than PCIE 6+2.

I'm not saying they're the ideal solution - I mean, they'll generally pull about 210W each when running inference - but they're not the crazy useless solution a lot of folk make them out to be either.

Gonna be honest I dont know what half of the acronyms you said mean but good to know about the performance.

There is a 'Cracked PCB' P100 for $75 on ebay rn... Looks like the crack is just the PCI locking tab... Other than that, looks like they sell for about $300 USD.

Thanks

digitalscream · October 27, 2024

On 10/25/2024 at 3:55 PM, TubsAlwaysWins said:

Gonna be honest I dont know what half of the acronyms you said mean but good to know about the performance.

There is a 'Cracked PCB' P100 for $75 on ebay rn... Looks like the crack is just the PCI locking tab... Other than that, looks like they sell for about $300 USD.

Thanks

Basically, it's all about AI workloads - that's what I used them for. LLMs (Large Language Models) are effectively the AI model, and they need huge amounts of VRAM when running on GPUs. GGUF (think "lossy compression") is just a way to make them smaller.

Worthy of note, for anybody reading along and thinking of using multiple Teslas for that use, is the fact that if you're using multiple GPUs to enable larger models, they don't run in parallel and get better performance. In fact, you get lower performance, because it's effectively two GPUs trying to act as one, and communicating across the PCIE bus is much slower relative to a GPU's internal fabric.

TubsAlwaysWins · November 4, 2024

On 10/27/2024 at 6:22 PM, digitalscream said:

Basically, it's all about AI workloads - that's what I used them for. LLMs (Large Language Models) are effectively the AI model, and they need huge amounts of VRAM when running on GPUs. GGUF (think "lossy compression") is just a way to make them smaller.

Worthy of note, for anybody reading along and thinking of using multiple Teslas for that use, is the fact that if you're using multiple GPUs to enable larger models, they don't run in parallel and get better performance. In fact, you get lower performance, because it's effectively two GPUs trying to act as one, and communicating across the PCIE bus is much slower relative to a GPU's internal fabric.

How does that work with the K80 cards since they are dual GPU?

digitalscream · November 4, 2024

5 hours ago, TubsAlwaysWins said:

How does that work with the K80 cards since they are dual GPU?

I have no idea - nobody really bothers with the Kepler and Maxwell dual-GPU cards for the workloads that I'm interested in, because they're missing a lot of the feature support required by llama.cpp for the really memory-intensive AI applications. Pascal is really the oldest generation (ie the 10x0 generation) that even mostly gets there. Maxwell Teslas are about half the speed of Pascal, and Kepler are even slower.

That said, I think they use something similar to the old SLI bridges to do the job. Don't quote me on that, though.

Sign In

R720 Tesla Card

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Biggest Test Bench I’ve Ever Seen

Latest From ShortCircuit:

Razer Finally Got a Desk Job - Razer Pro Type Ergo

Latest From TechLinked:

This Summer’s Lookin’ Steamy

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

The Secret Council Behind Every Emoji

Latest From The WAN Show:

Google’s Best Feature In Years - WAN Show June 5, 2026