Jump to content

can linux do this by default

I have a couple csv files that I would like to sort, remove duplicates, split and add as more data is generated

example

data  data  data  1         data 

data  data  data  2         data

data  data  data  5         data

data  data  data 10000  data

I know I can combine these and do what I want manually but my understanding is it combines into 1 large file then splits

is there a way to split the files now, run a command and add more data in later?

example

 

1-500.csv

data  data  data  1         data 

data  data  data  2         data

future input

future input

data  data  data  5         data

future after this

 

501-1000.csv

nothing would be here yet until some data is generated were column 4 would have some # between 501-1000

 

1001-1500.csv and so on.everything is empty

 

9501-10000

only 1 line and would be the last one

data  data  data 10000  data

 

I get some new.csv files and in it is

data  data  data  345  data, program would see this and add it to the 1-500.csv only

data  data  data  5      data, program sees this is already in the 1-500.csv and deletes the line.

the data could have potentially billions of lines and don't want to combine everything, sort, remove, split as new data needs to get entered. time consuming, potentially massive file.

Link to comment
Share on other sites

Link to post
Share on other sites

You can do this with scripts pretty easily. Anouther option is to just put all the csv files in something like R, process it in there, then export it back out as a CSV

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Electronics Wizardy said:

You can do this with scripts pretty easily. Anouther option is to just put all the csv files in something like R, process it in there, then export it back out as a CSV

got a few months to figure this out, while not a linux noob not experience enough for scripts and this R you are talking about

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, intertan said:

got a few months to figure this out, while not a linux noob not experience enough for scripts and this R you are talking about

 

MIght want to look into R, its pretty powerfull for data manipulation, and lots of guuides out there.

Link to comment
Share on other sites

Link to post
Share on other sites

I am pretty sure the best way to deal with all of that is moving your data to a database like mySQL, MongoDB or whatever flavor tickles your fancy.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×