Jump to content

Video how to make a Script who put .csv Data and add them to a DB with Python?

Required

Hi

Well the start is the hardest part. Some People are happy with a blinking LED. Well today I need to start by doing some (i would say) simple task.

I just need to search a CSV File Column for Values in Data to put them into an (not existing) DB for later analyse and usage.
Since we are here in a Tech Forum lets take an HDD as example:

SEAGATE Ironwolf PRO Enterprise NAS HDD 16TB 7200rpm 6Gb/s SATA 256MBcache 8,9cm 3,5Zoll 24x7 für NAS und RAID Rackmount Systeme BLK

Here are many Information to get from the Data like its a HDD, 16TB ofr Storage, Sata Connection, 3,5 Zoll Size for NAS and it come in a "Bulk" Package.
Is there some simple and easy how to make a script to put all the information out of the Data? Sure I must define the search Parameter like what a HDD, SDD is or what different type of size the are.

Thanks

From AT. :x

Link to comment
Share on other sites

Link to post
Share on other sites

Depends...

If the data is available in a structured format that would make things easier.

If the format is not given or well defined then, well..  dann viel Glück damit! It's not impossible but more complex and error prone. 😕

ಠ_ಠ

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, shadow_ray said:

If the data is available in a structured format that would make things easier.

Yes its a "normal" CSV File. So that Line I posted can be "Analysed" without any Problem.

 

16 minutes ago, shadow_ray said:

and error prone.

Well I have time to deal with it.

 

I guess that I have define each Filter by Hand. My idea is maybe stupid but I would have a Positive and a Neutral List, DB,...

Like the Line I showed before:
 

44 minutes ago, Required said:
SEAGATE Ironwolf PRO Enterprise NAS HDD 16TB 7200rpm 6Gb/s SATA 256MBcache 8,9cm 3,5Zoll 24x7 für NAS und RAID Rackmount Systeme BLK

Known Values should be moved from the "neutral" to the Positive List like:
HDD, 16TB, 7200rpm, 6Gb/s, SATA, 256MBcache,"3,5Zoll", 24x7, BLK

So what is left over is perfect for me to create new Filter out that Data.

 

There could be maybe one DB who the Line I show above is loading into.

If that work I want make my own "GUI" to make a better search function than my IT Distributor offer me and add some features like: Size of the Drive / € = price of each TB. (I know its not that easy since there are a lot of different Types, Warranty,... but thats why I need to use such Filter before.)

Thanks

From AT. :x

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Required said:

Yes its a "normal" CSV File. So that Line I posted can be "Analysed" without any Problem.

 

Well I have time to deal with it.

 

I guess that I have define each Filter by Hand. My idea is maybe stupid but I would have a Positive and a Neutral List, DB,...

Like the Line I showed before:
 

Known Values should be moved from the "neutral" to the Positive List like:
HDD, 16TB, 7200rpm, 6Gb/s, SATA, 256MBcache,"3,5Zoll", 24x7, BLK

So what is left over is perfect for me to create new Filter out that Data.

 

There could be maybe one DB who the Line I show above is loading into.

If that work I want make my own "GUI" to make a better search function than my IT Distributor offer me and add some features like: Size of the Drive / € = price of each TB. (I know its not that easy since there are a lot of different Types, Warranty,... but thats why I need to use such Filter before.)

Thanks

Quick question, "3,5Zoll", are you from a country that utilizes commas to represent the decimal point?  [Or rather will your data be in that format].  That could create issues when working with csv files (speaking from experience here where I had commas used as thousands separators).

 

Anyways, to quickly answer your question

https://datatofish.com/import-csv-sql-server-python/

That link should help.  It gives an example of reading a CSV into an SQL database

 

Not sure if you would want to, but using MS Access might also be a viable option (depending if it will be just you using it or not though).  If it's something that just you will be using, I think Access could be a very very viable option (you can create GUI's, and accessing the data is really simple)...i.e. you have everything packed all together that way.

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, Required said:

Yes its a "normal" CSV File. So that Line I posted can be "Analysed" without any Problem.

If your CSV is a normal one then your data contain 3 columns with the following values :

 

Column 1 : SEAGATE Ironwolf PRO Enterprise NAS HDD 16TB 7200rpm 6Gb/s SATA 256MBcache 8
Column 2 : 9cm 3
Column 3 : 5Zoll 24x7 für NAS und RAID Rackmount Systeme BLK
Link to comment
Share on other sites

Link to post
Share on other sites

well, that line is not really a line that would be in a CSV file ... it's just coincidence that you have some "," characters

If I were to work on something like this, I'd start by pre-processing the data with some rules like  "Wherever there's a comma between two numbers, change to a dot" ... so that you'll have 3.5Zoll , 8.9cm  etc etc

 

Next I'd build a sort of dictionary of keywords like let's say "rpm" , cache , MB , GB, TB, nas, sata,  etc   ... if there's numbers space keyword then combine them  .. so 7200 rpm = 7200rpm

 

Your line is basically

manufacturer (in uppercase) ,

product model, 

specialization (NAS) ,

product type (HDD) ,

characteristics ( 16 tb, 7200 rpm , 6 gb/s , sata , 256MBcache, 8.9cm, 3.5" ,  24x7)

extra keywords - for SEO mostly, and some for user friendly  from "fur to almost the end"

Packaging type  BLK, retail, whatever

 

So I'd tokenize it  after you replace , with . and join numbers to keywords, then split the into chunks wherever there's spaces, then parse every token and look for those keywords in dictionary and extract number from token, then remove token from list ... at the end you end up with just the words you couldn't take out .. maker, model and that filler text.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Franck said:

If your CSV is a normal one then your data contain 3 columns with the following values :

Nope the Source File is a "real" CSV one. There other Data who can be 1:1 written into the DB like Price, Quantity, Product No, EAN,...

 

From AT. :x

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Required said:

Nope the Source File is a "real" CSV one. There other Data who can be 1:1 written into the DB like Price, Quantity, Product No, EAN,...

Can you provide more than one example of possible values that can be present in the column your first example came from? I assume there's stuff other than just storage devices, let's say monitors, mice, … Unless all of that data follows some common rules you'll have a very hard time turning that into structured data without errors.

 

It's probably safer to move that string into a column named "description" and then present that data as given when the database is queried and search through it with a full-text search.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×