Jump to content

SSD for a math research database?

 

 

Hello Forum

Which SSD would you recommend for a math research database application? All suggestions would be very gratefully received.

Hardware:

·     Mac Pro 2019 / 16-core / 192 Gb RAM  

Database:

·     Software: XOJO with SQlite (flat tables, non-relational) / 70 tables in all / each table has only 7 columns (fields)

·     number of records in each table varies from c. 20,000 to 200,000,000 (two hundred million)

·     the larger tables take weeks/months of non-stop processing to generate 

·     largest single table will be 100Gb (estimate)  / total size of DB is 1.2Tb (estimate) 

·     single user application

Relative to budget, we need:

·      extremely fast read/write speeds for table generation/loading (Note: searches, sorting, manipulation occur in RAM)

·      excellent reliability / zero overheating (to avoid noise, preserve the Mac Pro fans…)  

 Budget:

·      circa $3,000 / £2,500 (but only if absolutely necessary to get the best... hopefully nothing like this much...)

SSDs (& multi SSD PCIe cards) under consideration:

·      Sonnet M.2 4x4 PCIe Card (Silent) + Samsung 970 EVO Plus M.2 SSD—overheating??

·      Sonnet Fusion Dual U2 SSD + some U.2 Enterprise SSDs—more reliable, cooler?   

·      Samsung PM1725B

·      Intel Optane 905P (not relevant?)

·      Other Enterprise M.2 or U2 SSDs

 Questions:

·      Which SSD do you recommend? Why this one? Will it make much difference compared to other SSDs?

·      Is a high-end M.2 Consumer SSD sufficient?

·      Would U.2 Enterprise SSD be better?

·      Should we wait for next generation SSD?

·      Am I right that new gen (PCIe Gen 4) SSDs will not be compatible with Mac Pro PCIe slots?  

Truly knowledgeable advice greatly appreciated.

Thank you!

 

Link to comment
Share on other sites

Link to post
Share on other sites

Only thing I might be able to add (and it doesn’t have a whole lot to do with SSDs) is if those tables take weeks/months to generate you reeeaaaally don’t want to lose them, so it’s sounding like multilayer backup systems are exceptionally important.  Something corporate level.  As far as choosing an SSD goes there missing data. It can be important with an ssd to know the number of writes it will get, and how often does this thing get used, and what constitutes lag issues for that system. 

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks Bombastinator

 

Luckily the files are not toooo big, and we have a whole bunch of external hard drives for use as back-up. I also send the files to a cloud storage place in the sky. 😀  What's irritating is when it's decided we need another field, or the main programmer decides a change in file format would be more efficient... and everything has to be regenerated from scratch. The joys of research.

 

I'm no hardware expert (trying to learn...) but am more or less aware of the other parameters you mention. Regarding 'lag' issues, can you say anything which would help me to understand what they are?  I mention the Optane SSD because it seems to have extremely low latency, and am unclear, for example, how in this scenario the Optane would compare, say, to Samsung 970 EVO PLUS. And how much difference this would make over a long period...

 

When generateing the Tables, the application is not writing to the drive all the time, but intermittently adding chunks. The time intervals of when it does this vary hugely depending on the 'math terrain' through which the programme is travelling at a particular time.

Link to comment
Share on other sites

Link to post
Share on other sites

Maybe even a ram drive could be used, though SSDs or m.2s would be a cheaper and easier option.

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, ageekhere said:

Maybe even a ram drive could be used, though SSDs or m.2s would be a cheaper and easier option.

Thanks ageekhere. Well, in building one table, a subset of the records is generated in RAM (up to a certain number, or time), and is then written to disk, then (all in one continuous routine) another subset is generated, and written, and so on, until the table is complete. Since this Mac Pro could theoretically hold up to 768Gb RAM... maybe we should indeed revisit how this works, but we certainly can't afford quite that much RAM!

 

Ultimately, it is necessary to save the complete table/DB file. It can't just be held in RAM, because (given our reseach questions) it's at that point that the real work starts.  Good call, though! 😀

Link to comment
Share on other sites

Link to post
Share on other sites

Two thoughts:

 

1. Are we kind-of asking an impossible question? Should we just buy the fastest SSD we can find—and try it? We were hoping to avoid that because the project already cost a lot, (it's a private project related to maths and music) and at this late stage in the process, if possible, we really want to avoid expensive mistakes!

 

2. Is it possible to send the question(s) to Linus's team?  

 

Thanks

Link to comment
Share on other sites

Link to post
Share on other sites

57 minutes ago, Composer 133 said:

Two thoughts:

 

1. Are we kind-of asking an impossible question? Should we just buy the fastest SSD we can find—and try it? We were hoping to avoid that because the project already cost a lot, (it's a private project related to maths and music) and at this late stage in the process, if possible, we really want to avoid expensive mistakes!

 

2. Is it possible to send the question(s) to Linus's team?  

 

Thanks

Really fast SSDs are really expensive and there are to some degree diminishing returns at the high end (just like anything)  there is also consumer/hobbyist grade and enterprise grade.  Part of the problem is that there are really a lot of things not given.  How much data needs to be stored? How fast and how often does it need to be accessed?

The answers to those so far seem to have been “we’re not saying but we think a lot” and “we’re not saying at all”.

 

The real answer for both may seem to be “we don’t have a clue”. If this is the case, there are two ways to go: up from the bottom or down from the top. Up from the bottom would be make sure EVERYTHING has multiply redundant off site backup (there are experts on this one around.  I’m recalling one has an ostrich for an avatar but I can’t remember the name.  @LAwLz might also be one.  I am not) the get a big ol’ consumer grade lowballer sata SSD, possibly an array of them, and see how it goes.  If it’s cripplingly slow, the SSD system becomes slow on-site backup, and you go up the chain.  I got no idea where down from the top even starts. I’m just a game player.  There are those that might though.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Bombastinator said:

Really fast SSDs are really expensive...

 

Hi Bombastinator. Thanks again for this. Must be a misunderstanding 😀

(As I say…) the Total DB (DataBase, ie. all 70 tables, the complete shebang) = 1.2Tb (=estimate) -- not that large.

Backup is not an issue at all.

The question in a nutshell (and I should have put it like this in the first place, my fault, my brain is fuzzy with it all) is:

We need really fast read and write speeds, but we also need the drive not to overheat if left running for very long periods (weeks). Therefore, do we really need to buy an expensive Enterprise U.2 drive, or could we get away with a high-end consumer M.2? And, critically, can anyone recommend a drive that suits? (Ideally, based on what experience?)

Thanks for making me clarify!   😀😇

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Bombastinator said:

The real answer for both may seem to be “we don’t have a clue”. If this is the case, there are two ways to go: up from the bottom or down from the top. Up from the bottom would be make sure EVERYTHING has multiply redundant off site backup (there are experts on this one around.  I’m recalling one has an ostrich for an avatar but I can’t remember the name.  @LAwLz might also be one.  I am not) the get a big ol’ consumer grade lowballer sata SSD, possibly an array of them, and see how it goes.  If it’s cripplingly slow, the SSD system becomes slow on-site backup, and you go up the chain.  I got no idea where down from the top even starts. I’m just a game player.  There are those that might though.

I sadly know very little about storage.

 

But do I understand the workload correctly.

You read a ton of data into RAM, does processing to it, and then write some tables to the drive, and then you repeat the process all over again, and it will be about 1.2TB of data storage on the drive in total? Will you be reading a bunch of small files data sets and then writing big (like 10+ GB) tables?

And you want great reliability and no risk of overheating running them at high usage constantly, right?

 

 

I think the 1.2TB is kind of a problem because a lot of drives cap out at 1TB.

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, Composer 133 said:

Hi Bombastinator. Thanks again for this. Must be a misunderstanding 😀

(As I say…) the Total DB (DataBase, ie. all 70 tables, the complete shebang) = 1.2Tb (=estimate) -- not that large.

Backup is not an issue at all.

The question in a nutshell (and I should have put it like this in the first place, my fault, my brain is fuzzy with it all) is:

We need really fast read and write speeds, but we also need the drive not to overheat if left running for very long periods (weeks). Therefore, do we really need to buy an expensive Enterprise U.2 drive, or could we get away with a high-end consumer M.2? And, critically, can anyone recommend a drive that suits? (Ideally, based on what experience?)

Thanks for making me clarify!   😀😇

 

Overheat is not an issue with SSDs in general.  There’s actually an argument that SSDs and computer equipment in general commercial or enterprise, likes to be left on all the time.

U.2 should not be required for that reason.  What I would do is start with a cheapo 2tb sata SSD which should be under a hundred bucks. The data on the drive though sounds like it takes tens of thousands to create though ignoring the problem of not having access to it for weeks so off site backup of some sort seems prudent. (Fires and idiots occur)   The problem with low end consumer is it might not be fast enough.  It might be though.  The upside of lowball is if it turns out to be too slow it’s not that much more pricey than a 2tb HD which would be needed for backup anyway so you don’t waste much if it turns out it doesn’t cut it.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, LAwLz said:

But do I understand the workload correctly.

 

Will you be reading a bunch of small files data sets and then writing big (like 10+ GB) tables?

 

And you want great reliability and no risk of overheating running them at high usage constantly, right?

Thank you LAwLz, appreciated. It’s not quite like that.

In short: we’re populating the tables of a database; there are only 70 tables. The long routines are the process of populating (huge amount of calculation). The smallest table has only c.20,000 records; each table gradually gets bigger until the 70th and largest table which has c.200 million. The records are very small and compact, hence the (estimated) size of the TOTAL DB (all 70 tables together) is ‘only’ 1.2Tb.

The tricky bit (and maybe it would be better if it didn’t, but it’s due to the history of  the project and it’s hard to change for various complicated reasons…) is that the routine drops chunks of records onto the drive as it goes. The frequency of when  this happens is variable depending whichever bit of the math terrain the program is passing thru at the time (hope that metaphor works…). Sometimes it’s very frequent for a longish period, sometimes it's much less frequent.  Obviously it’s in the ‘frequent’ sections that we’d worry about overheating.

To the last part—we want speed (+reliability) but no overheating.

Does this clarify? 

Link to comment
Share on other sites

Link to post
Share on other sites

51 minutes ago, Bombastinator said:

Overheat is not an issue with SSDs in general. 

 

 

For this project, SATA is a non-starter, just too slow. We have to minimise (to an affordable extent) time lost due to inadequate hardware. (It would almost certainly have been better to do this work on PC, but sadly not an option). 

 

So we looked to M.2 -- and it turns out some of these do get very hot (a review of Samsung 970 Evo Plus reports 93 degrees at extreme load).

 

And I guess Enterprise U.2 SSDs exist for a reason, no??

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, Composer 133 said:

 

For this project, SATA is a non-starter, just too slow. We have to minimise (to an affordable extent) time lost due to inadequate hardware. (It would almost certainly have been better to do this work on PC, but sadly not an option). 

 

So we looked to M.2 -- and it turns out some of these do get very hot (a review of Samsung 970 Evo Plus reports 93 degrees at extreme load).

 

And I guess Enterprise U.2 SSDs exist for a reason, no??

There’s a breakdown of the particulars of SSDs and heat in one of the Linus videos you may find useful.  More or less SSDs actually like heat within certain bounds.  Looked for the video but failed. 
 

Im afraid I don’t know much about U.2.  I’m not sure it’s much faster anymore though.  My memory is U.2 predated m.2.  Mostly it’s a different connector.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Bombastinator said:

There’s a breakdown of the particulars of SSDs and heat in one of the Linus videos you may find useful.  More or less SSDs actually like heat within certain bounds. 

 

Possibly this one: .

 

Aside from the water cooling stuff, he says: running at too high a temeprature will degrade the SSD over time; running at a median temperature is optimal speed-wise and longevity-wise; and running at too low a temperature is likely to be slow.  Just as you'd expect really.

 

Maybe we just need the $7 water-cooler and some hosepipe 🤠

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×