Jump to content

.csv databasing and some extras.

Malscythe

FYI: head to the second paragraph for a visual aid

Hey guys! I've finally found some answers to my previous questions and have finally bin able to start the project I needed to get too work! I've gotten about 25% of the way to a proper working version but some of the things I am running into includes, creating and writing to a .csv file as well as everytime new information is entered in, it will add it automatically to said .csv file, a way to parse through the .csv file as kind of as a data look up but also is it possible to write to a different .csv file if I wanted to seperate the information from one database to another, I have included a image down below. Just a heads up everything is working at this time except for this .csv database. I'm not askign for an exact line of code on how all this will work but I am asking for a kick in the write direction where I should be looking for in documentation and all that sorts of fancy stuff.

 

For all those who cannot make sense of what I am asking about in the above section this is a visual aid with some information on what the hell I am talking about.

 

Green Box - So for this box I want this information to be sent to a different database that I can look through afterwards that it will always save to assuming no other similiar information is present 

ie.) Information is entered --> Checks to see if there is any other similiar information, like the name and the number --> Saves new information to the file or if it is  already present it ignores saving information to the file.

 

Red Box - For this box we can just ignore it as I already have this part saving into the Blue Box with no current issues, but I did draw it in to show you where the information was coming from for Blue Box

 

Blue Box - Now this information I want too be able to save this into a different .csv file from what Green Box is saving its information too. It would be nice is for every time the program is loaded you have an option to load a .CSV file or create a new one.

 

 

Although I do understand this might not be the best and fastest solution it may be just one file that is the best and the fastest. 

I thank you all for any help you can provide and insight on what should be optimized and maybe even suggest a new way to go about tackling my database issue. 

 

 

 

Program with diagram.png

Link to comment
Share on other sites

Link to post
Share on other sites

I guess I should have put this in before, I know very little about databases, especially .csv databasing I know even less about as I did a tiny bit of MySQL back in the day for Garry's Mod LUA

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Ecchi said:

Can you change the color of the black text? Its hard to read on dark mode. 

Ya sorry about that I was trying to switch back and forth between colors. Also thought LTT forums gave me a hentai virus because I saw this f9bd3d25ef.png

 

Link to comment
Share on other sites

Link to post
Share on other sites

 What programming language?

 

A CSV file is NOT a database, it's just a text export of some information.

 

You have values separated by comma, and a whole record is separated by the next one using ENTER (0x0D + 0x0A or just 0x0A)

 

The most basic version assumes you don't have comma characters in any value or the ENTER character. So, a valid CSV file would be

value1,value2,value3<enter>

x,y,z

 

To read the file, you'd simply have to search for the ENTER characters, extract the segment of text (ex "value1,value2,value3") , then separate the segment of text into values.

 

This is often NOT enough.

 

If a value contains comma or ENTER characters, you wrap the value into either single quotes ( ' ' )  or double quotes ( " " )  or other combination of character (but single quotes and double quotes are the most common.

The separator character will also have to be escaped by placing a \ character in front

So for example, this is a row with 5 values:

 

first_value, 'Smith, John', 'Mary\'s Boat','A long text with ENTER characters
that is surrounded by quote characters',fifth value
-SECOND ROW STARTS HERE-

In this case, you basically have to read character by character and make your own parser.

 

Depending on programming language you use, there may be libraries that read or write CSV files, so you don't have to deal with it.

 

You also have the option of export TSV files, which are basically CSV files but you use Tab character (0x09) to separate values.

As Tab character is much less used in values, you could get by without having to use quotes or double quotes to wrap values that contain comma in them.

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, mariushm said:

 What programming language?

 

A CSV file is NOT a database, it's just a text export of some information.

 

You have values separated by comma, and a whole record is separated by the next one using ENTER (0x0D + 0x0A or just 0x0A)

 

The most basic version assumes you don't have comma characters in any value or the ENTER character. So, a valid CSV file would be

value1,value2,value3<enter>

x,y,z

 

To read the file, you'd simply have to search for the ENTER characters, extract the segment of text (ex "value1,value2,value3") , then separate the segment of text into values.

 

This is often NOT enough.

 

If a value contains comma or ENTER characters, you wrap the value into either single quotes ( ' ' )  or double quotes ( " " )  or other combination of character (but single quotes and double quotes are the most common.

The separator character will also have to be escaped by placing a \ character in front

So for example, this is a row with 5 values:

 


first_value, 'Smith, John', 'Mary\'s Boat','A long text with ENTER characters
that is surrounded by quote characters',fifth value
-SECOND ROW STARTS HERE-

In this case, you basically have to read character by character and make your own parser.

 

Depending on programming language you use, there may be libraries that read or write CSV files, so you don't have to deal with it.

 

You also have the option of export TSV files, which are basically CSV files but you use Tab character (0x09) to separate values.

As Tab character is much less used in values, you could get by without having to use quotes or double quotes to wrap values that contain comma in them.

 

That is most interesting I just always thought the .CSV file was just an excel database as it was always presented as a database to myself where ever I looked.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Malscythe said:

That is most interesting I just always thought the .CSV file was just an excel database as it was always presented as a database to myself where ever I looked.

Nah, CSV is litterally short for "comma separated values". Go ahead and open a .csv file in Notepad. It's just text, separated by commas and some quotes to escape commas and line breaks. It's not a database. It's fine if you have little data, but it's going to be very bad in terms of performance if you have a lot of data. There's no built-in way to express relationships, keys, indexes, foreign keys etc.

 

You can, of course, treat a .csv file as a table and have one .csv file for each table and then use some column as a foreign key. But it'll be very slow when you have large amounts of data, since you have nothing like an index that a proper database uses to speed up queries. In the long run, you'll be better off with a proper database.

 

In terms of libraries etc., it might be better to use something like JSON. I think it's easier to work with (more "standardized" in a way). It's still plain text, so you can still look at the files in a text editor, but it's usually better supported. But again, not the best in terms of performance compared to a real database, if you're dealing with lots of data.

 

Here's an example of what a JSON might look like:

{
    "name": "Some name",
    "bandNumber": "Some number",
    "purchases": [
        {
            "purchased": "whatever",
            "pricePerLiter": 1.48,
            "tax": 19.0,
            "totalLiters": 5345,
            "datePurchased": <timestamp>
        },
        {
            "purchased": "whatever",
            "pricePerLiter": 1.48,
            "tax": 19.0,
            "totalLiters": 987,
            "datePurchased": <timestamp>
        },
        {
            "purchased": "whatever",
            "pricePerLiter": 1.48,
            "tax": 19.0,
            "totalLiters": 686,
            "datePurchased": <timestamp>
        },
        ...        
    ]
}

The big advantage I see is that a JSON can contain "objects" and "collections", which means you can use it to express simple relationships. For example, an object can contain a collection, which in turn can contain other objects and collections.

 

You also have primitive support for different data types, because a JSON can contain strings, numbers, booleans and null. In lots of programming languages you have libraries available that can directly map objects and collections to and from JSON (for example Java has Jackson and GSON and probably lots more).

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

31 minutes ago, Eigenvektor said:

 


{
    "name": "Some name",
    "bandNumber": "Some number",
    "purchases": [
        {
            "purchased": "whatever",
            "pricePerLiter": 1.48,
            "tax": 19.0,
            "totalLiters": 5345,
            "datePurchased": <timestamp>
        },
        {
            "purchased": "whatever",
            "pricePerLiter": 1.48,
            "tax": 19.0,
            "totalLiters": 987,
            "datePurchased": <timestamp>
        },
        {
            "purchased": "whatever",
            "pricePerLiter": 1.48,
            "tax": 19.0,
            "totalLiters": 686,
            "datePurchased": <timestamp>
        },
        ...        
    ]
}

 

I never really thought about JSON, as I don't really have any information on it, would I be able to integrate this in with C# or would I be looking at a more round about way of getting this done, and would you be able to direct me into sources so I can start learning how it works, how to read/write and search for information.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Malscythe said:

I never really thought about JSON, as I don't really have any information on it, would I be able to integrate this in with C# or would I be looking at a more round about way of getting this done, and would you be able to direct me into sources so I can start learning how it works, how to read/write and search for information.

Sure, there's libraries for dealing with Json in C#, too. Here's a fairly popular one: https://www.newtonsoft.com/json

 

I think .Net even has something built-in these days, but last I remember it wasn't the fastet.

 

The basic idea is that you can serialize your hierarchy of data objects to Json and then write that to a file.

 

You can load the Json from a file and then deserialize that into C# objects again.

 

As I said, Json is essentialy just text, so you can easily read and modify these files for test purposes.

 

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Eigenvektor said:

Sure, there's libraries for dealing with Json in C#, too. Here's a fairly popular one: https://www.newtonsoft.com/json

 

I think .Net even has something built-in these days, but last I remember it wasn't the fastet.

 

The basic idea is that you can serialize your hierarchy of data objects to Json and then write that to a file.

 

You can load the Json from a file and then deserialize that into C# objects again.

 

As I said, Json is essentialy just text, so you can easily read and modify these files for test purposes.

 

Thanks a ton, this shall be an interesting 6 week development project can't wait to see it through.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Eigenvektor said:

Here's another option you could try:

https://www.litedb.org/

Thanks! I think I have decided on how to approach this project. I think I will do daily transnational logs in different .CSV files, and store customer data in a REAL database.

As why just transactions on average we can do about 15-50 tax exempt sales a day.

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Malscythe said:

Thanks! I think I have decided on how to approach this project. I think I will do daily transnational logs in different .CSV files, and store customer data in a REAL database.

As why just transactions on average we can do about 15-50 tax exempt sales a day.

You would want to use a database management system(DBMS) instead of directly storing stuffs into .json, cvs, or whatever else files. 

Most popular and free DBMS are mysql for relational database and mongodb for nosql database. Let these specialize softwares handle storing persistent datas for you. They offer features like indexing and sharding which are quite handy for scalability. I literally made a web app off mongodb that could handle user traffics as large as facebook and twitter by load balancing and sharding my database across multiple servers. Holy freakin shit.... 

Sudo make me a sandwich 

Link to comment
Share on other sites

Link to post
Share on other sites

27 minutes ago, mariushm said:

For such a low volume, you could use SQLite as a database, here's a tutorial for C# : http://zetcode.com/csharp/sqlite/

 

You can open the database with various tools, like https://sqlitebrowser.org/

 

Has the additional benefit of the database being a single file on disk, which you can copy and make backups at any point.

Ya I guess I could do it that way, but I only really need to get these logs routed into my custom impact driver printer and then print off a total sheet onto a receipt printer

so it would just print out all the liters per transaction and the dollar amount per transaction and then total it at the end of the receipt.

and I didn't mean one .csv file per transaction I mean 1 .csv file for each day of the week

 

32 minutes ago, wasab said:

You would want to use a database management system(DBMS) instead of directly storing stuffs into .json, cvs, or whatever else files. 

Most popular and free DBMS are mysql for relational database and mongodb for nosql database. Let these specialize softwares handle storing persistent datas for you. They offer features like indexing and sharding which are quite handy for scalability. I literally made a web app off mongodb that could handle user traffics as large as facebook and twitter by load balancing and sharding my database across multiple servers. Holy freakin shit.... 

I'm really not running a internet based service here, just a local offline database that needs to hold onto some information for simplicity.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Malscythe said:

I'm really not running a internet based service here, just a local offline database that needs to hold onto some information for simplicity.

@wasab isn't wrong though. An actual DBMS has many advantages. It's not quite as simply to set up as the one for C# I linked above, but on the flip side you gain speed and reliability. I'm sure you won't need the advanced features like cluster support, sharding, indexing etc. right away, but they're there if and when you do. You also get built-in support for backups and a powerful query language to boot. How long until someone comes along and wants to know "How many liters did we sell between last year and now, on average?"

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Eigenvektor said:

@wasab isn't wrong though. An actual DBMS has many advantages. It's not quite as simply to set up as the one for C# I linked above, but on the flip side you gain speed and reliability. I'm sure you won't need the advanced features like cluster support, sharding, indexing etc. right away, but they're there if and when you do. You also get built-in support for backups and a powerful query language to boot. How long until someone comes along and wants to know "How many liters did we sell between last year and now, on average?"

As for the ltiers per year on average, the government... The Canadian Government wants to know... always

Link to comment
Share on other sites

Link to post
Share on other sites

CSV is not less a database than MySql. It doesn't have all the feature and there is no engine by default but if you build the engine that manage the different CSV that retrieve the data you made yourself a Database CSV. CSV will contain information for a single table. I can mention out of my mind quickly most Bix phone system use CSV database, Ragnarok Online (a mmorpg game) use CSV database for custom servers. Government climate data are either given as CSV or live read. This is a category called second gen database. It as a database system as RavenDB is. Both save text file, both column values are split by commas, one split rows with brackets the other an enter key.

 

That being said CSV might not be the easiest to work with and other format such as XML and JSON would make the conversion from file to class object much easier to work with. CSV, JSON and XML can all be lazy loaded but depending on your language some might already be implemented (usually XML / JSON)

Link to comment
Share on other sites

Link to post
Share on other sites

So it looks like this is just for a personal project/school assignment.  If so then that gives you a lot more options to mess around, make mistakes and learn things (which is always great!).

 

However, on the off chance this is being used for a system that might potentially go into production/be sold, then it is critical you use a fully fledged and independent Database system, such as MySQL, Microsoft SQL Server, etc.    And I know I'm going to get crap for saying this, but SQLlite and similar database libraries that aren't independent systems and which integrate directly into your program are not a substitute.  They have their purpose and are great for what they do (and are wonderful for learning), but they aren't a substitute for MySQL/SQL Server and others.

 

If this is just a personal project/school assignment, then SQLlite or similar projects would be a good starting point.  There is a bit of barrier to entry if this is your first time working with them and/or SQL, but if you already have a specific goal in mind, you should be able to see how it easily applies to your project.  If you haven't worked with SQL before, I would recommend going through W3School's SQL Tutorial

 

To be clear, I do think understanding how to work with CSV, JSON, & XML files is still very valuable.  I have to work with all 3 all the time.  The primary difference is they're generally designed to be static files (i.e. once they're generated, they don't change).  CSV's can be easily appended.  Now yes, of course all 3 can be modified, and there are programs to do so.  Usually though what ends up happening is the entire file will be read into memory, parsed into a data structure, modified, and then converted back to text and stored in the file.  The problem is, this doesn't scale very well, so the more data you store, the slower and slower it gets.

 

This is the primary difference between CSV, JSON, XML files and an actual database: Databases are designed to be scalable, for both reading and writing data.  This way you can have a database that is terabytes large, but doesn't require it to be run on a computer with terabytes of memory.

 

Modern Databases have a huge list of other features as well, features that you very well might not need now, but likely will in the future, which is the reason I said that if this is for a production system, you should use an existing one, rather than trying to make your own from scratch.  The company I work for has quite a few clients who use programs designed by some smartass dev who thought he knew better and implemented their own method for storing data, and ended up shooting themselves in the foot, because their "database" ended up hampering a bunch of ideas for upgrades, or integrating additional programs because there was just no way for them to be compatible (not to mention usually ended up being very slow the larger they got).

 

Sorry, bit longer than I meant it to be.  Bit of a sore spot with me as I've had to deal with lots of situations like this in the past.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, JacobFW said:

So it looks like this is just for a personal project/school assignment.  If so then that gives you a lot more options to mess around, make mistakes and learn things (which is always great!).

 

However, on the off chance this is being used for a system that might potentially go into production/be sold, then it is critical you use a fully fledged and independent Database system, such as MySQL, Microsoft SQL Server, etc.    And I know I'm going to get crap for saying this, but SQLlite and similar database libraries that aren't independent systems and which integrate directly into your program are not a substitute.  They have their purpose and are great for what they do (and are wonderful for learning), but they aren't a substitute for MySQL/SQL Server and others.

 

If this is just a personal project/school assignment, then SQLlite or similar projects would be a good starting point.  There is a bit of barrier to entry if this is your first time working with them and/or SQL, but if you already have a specific goal in mind, you should be able to see how it easily applies to your project.  If you haven't worked with SQL before, I would recommend going through W3School's SQL Tutorial

 

To be clear, I do think understanding how to work with CSV, JSON, & XML files is still very valuable.  I have to work with all 3 all the time.  The primary difference is they're generally designed to be static files (i.e. once they're generated, they don't change).  CSV's can be easily appended.  Now yes, of course all 3 can be modified, and there are programs to do so.  Usually though what ends up happening is the entire file will be read into memory, parsed into a data structure, modified, and then converted back to text and stored in the file.  The problem is, this doesn't scale very well, so the more data you store, the slower and slower it gets.

 

This is the primary difference between CSV, JSON, XML files and an actual database: Databases are designed to be scalable, for both reading and writing data.  This way you can have a database that is terabytes large, but doesn't require it to be run on a computer with terabytes of memory.

 

Modern Databases have a huge list of other features as well, features that you very well might not need now, but likely will in the future, which is the reason I said that if this is for a production system, you should use an existing one, rather than trying to make your own from scratch.  The company I work for has quite a few clients who use programs designed by some smartass dev who thought he knew better and implemented their own method for storing data, and ended up shooting themselves in the foot, because their "database" ended up hampering a bunch of ideas for upgrades, or integrating additional programs because there was just no way for them to be compatible (not to mention usually ended up being very slow the larger they got).

 

Sorry, bit longer than I meant it to be.  Bit of a sore spot with me as I've had to deal with lots of situations like this in the past.

Okay so, I should use a precompiled database? Or am I getting the wrong idea as in, I should just use SQL to create my own.

Link to comment
Share on other sites

Link to post
Share on other sites

39 minutes ago, Malscythe said:

Okay so, I should use a precompiled database? Or am I getting the wrong idea as in, I should just use SQL to create my own.

People are just telling you that static file is usually not something you would want to use for storing transaction datas like you are doing. Besides the inefficiency, you also need to manually write the parser yourself. Why would you if you can just use sql to sort through the data like getting the averages, specific item, count, and such? 

 

You need to learn SQL and then a SQL DBMS. sqlite is good for small applications which do not require a standalone dedicated DBMS.

If you are talking about creating your own SQL DBMS, you can forget about it. You will have no idea of how much work is involve, it is nothing like just parsing a file and then read and write to it like you are thinking. You can study the entire theory behind it like relational algebra, B+ tree, database algorithmic complexity, ACID properties, ect and still have not a single clue on how to actually design one.  

Sudo make me a sandwich 

Link to comment
Share on other sites

Link to post
Share on other sites

43 minutes ago, Malscythe said:

Okay so, I should use a precompiled database? Or am I getting the wrong idea as in, I should just use SQL to create my own.

Bit of a mix up in communication here.  

The Database Manager is the software that you will usually interact with.  It is the program that will actually make the low level changes to any and all files that it uses to store data in.  You use SQL to give it instructions on what changes you want to make, and it then handles all the nitty gritty details. 

 

The Database itself refers to the actual data you are storing.

 

So SQL isn't a language used to make a database manager, it's the language you use to interact with the Database Manager.  Most of the modern day databases will provide you with a UI so you don't have to use SQL for everything, but you will still need to know SQL if you want your program to be able to be able to get data and/or make changes to the database.

 

When you first start up the database manager there will be no data at all.  So to use some examples from your situation, you would start off by creating the database, and probably call it something like InventoryMgmt.

  • Next you would create a table designed to store information about the person in the transaction, so you would probably name the table Person
  • Next in the Person's table, you would need to add a few columns.  Likely you would start off with FirstName, LastName, & BandNumber. 
  • Next you would create a table for the transactions, so obviously just call it Transactions (note, that might be a reserved name, similar to how int is reserved in C/C++ and a lot of other languages).
  • Now in the Transactions table, you would start off by adding a column called BandNumber.  This let's you link the individual who's stored in the Person table, to any and all transactions they've made.  You'd want to do it this way, as the bandnumber would likely be an integer, which can be stored much more compactly than if you had to just reenter they're name every time (which would also likely lead to typos, making it harder to figure out if they're the same person or not).
  • Next you would a column to store the date/time of the transaction, and a column each for the quantity purchased, the price per liter, liters w/tax, and the final amount.

 

Once you have that setup, then you can start writing the code for your program that can add people + transactions to the system.

 

 

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, JacobFW said:

 

Once you have that setup, then you can start writing the code for your program that can add people + transactions to the system.

 

 

 

 

 

 

Okay that makes a lot of sense actually, but my next issue is I see a lot of people always saying "We are going to make these two tables communicate to each other". So far I haven't seen anything about not talking to each other at all, I just want the transactions and customer info to be seperate in the database, I have no reason for them to be together. Is SQL still my best choice than? Do I have to make both tables talk?

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×