Jump to content

big data ANN, low storage!!!

farhang amaji

hi
I hope u can help me and show me is the thing that is in my mind, does it  have a routine solution or I have to do all of it by myself:
I want to work with 50 gigs of unique data(in 100 series of 250,000 numbers) and I want to add at least 200 new series from each of those 100 series(20000 series) approx 10,000 gigs then 
create and train ANN on it, probably with python machine learning platforms(pytorch, tensorflow...)
how can I do it without having a storage no more 200 gigs(its an approx I mean less than 10000 gigs)
is it possible? or I should create a ann from scratch and manipulate it to which, each data evaluation(1 of 250,000  data evaluation) within each epoch just reads 100 number(1 from each 100 series) then creates 19900 other numbers then do it like 250,000 more to finish one epoch and do it other epochs till training of ann is finished

Link to comment
Share on other sites

Link to post
Share on other sites

I think your math is off. 10,000 GB is 1000 TB, which is 1 petabyte. It would take a VERY long time to train on that much data unless you, quite literally, are using a supercomputer. 

 

If your numbers of 100 sets of 250,000 numbers is correct, and we assume that a number is a 32 bit (4 byte) value, then you actually only have 100 MB. Then, when you make your 200 series per series, you only have 200 times as much data, or 20 GB total.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, straight_stewie said:

I think your math is off. 10,000 GB is 1000 TB, which is 1 petabyte. It would take a VERY long time to train on that much data unless you, quite literally, are using a supercomputer. 

 

If your numbers of 100 sets of 250,000 numbers is correct, and we assume that a number is a 32 bit (4 byte) value, then you actually only have 100 MB. Then, when you make your 200 series per series, you only have 200 times as much data, or 20 GB total.

first I am not sure about size of 100 sets of 250,000 numbers and 50 gig is only example in this whole text

second 10,000 GB is not 1000 TB, its 10TB

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, farhang amaji said:

Second 10,000 GB is not 1000 TB, its 10TB

That's right. I guess I added an extra zero when I did it on my calculator. 

 

In either case, I believe your dataset size estimate is still far off. For 20,000 sets of 100,000 numbers you have 2 billion numbers. If they are ints (32 bits), that's only 8 billion bytes, or 8 GB. If they are 64 bit numbers that's only 16 GB. If you're using 128 bit floating point numbers that's only 32 GB of data.

 

With a data set that small you shouldn't really need to worry about manually paging or anything. Your database or whatever file handling library you are using should be able to handle that with ease. 

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

I think this is a good case of trying to run before you can walk.

 

Before you attempt this main project you have in mind, you need to get some experience using existing tools for ANN, in order to see what they're capable of, and if you think they'll suit your needs, or if you need to use another language  / framework.

 

In regards to the number of training iterations, honestly this isn't the best place to ask, and the forums dedicated to Neural Networks are likely going to expect you to have a certain level of experience with the existing tools before they'll take you seriously (not trying to be mean; just the way it works in pretty much any specialized community).

 

You also need to understand that pretty much every Neural Network in a commercial environment has been tuned to an insane degree, both for speed and accuracy/precision, meaning that current frameworks aren't going to provide you with networks that have the same characteristics of the commercial ones, as the commercial ones likely use multiple frameworks, or were coded by hand to work the way the developers needed.

 

TL;DR get familiar with the existing frameworks.  Learn to walk before you can run.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×