Jump to content

A request for Database design advice

LukeTim

So, I am developing a webapp which collects betting odds from different sources and calculates the profitability of certain bets. When the user logs in, they can view a list of the days horse races (it's only horse racing at the moment, but will expand to other sports later), choose one, and then the app will display all the odds, and show how profitable they are, or also if they would make a loss.

My idea for how to do this would be to collect odds every 30 minutes or so using a python script and a cronjob or something (so as not to hit the external servers every time a user makes a HTTP request for the data) and put this into a database.

Of course, this means I will need to design a database schema. I have experience with managing and administrating databases but my experience of designing them is limited. So, I have thrown something together to make a start, but I feel like maybe I am missing some things. Perhaps there are opportunities to collect/store data that could be useful that I am not considering. So I thought I would ask here if anybody has any suggestions about what fields/tables/relationships I could add to this database... or if there are any changes to my design that could be made.


Attached is an image of the design in MySQL Workbench. Ignore the database types as they will be handled by the ORM. Also, all tables will have the created_at and updated_at fields, but I left them out since they are just metadata.

Screen Shot 2016-02-06 at 22.02.30.png

Link to comment
Share on other sites

Link to post
Share on other sites

check pastebin.com some amazing stuff one there! (which you probably already know but still....)

Link to comment
Share on other sites

Link to post
Share on other sites

When it comes to the User table, you may need something for password resets (like another table or some extra columns). Temporary password/reset token, expiry date, or whatever you need for however you implement it. You may choose to also record if an email is verified or not.

 

I'm not very familiar with horse racing/betting so it's hard for me to make any suggestions in that domain. Any use in keeping track of the riders info (if possible)?

Link to comment
Share on other sites

Link to post
Share on other sites

One suggestion that i can give you is that whenever you bring data from the "outside" is to dump all the data in tables that your application don't use and then use a `Stored Procedure` to import the actual data in your production tables.

 

this is useful because you work with local data when you copy in the production tables, also you can do data manipulation or conversion using the db itself and not an external programs, so it's less likely that you import bad data.

 

One advice if you go this route is to use a couple of column in this new table, PROCESSED to check if is processed or not, or it happened some error (think like `Yes`, `No` or `Error` ) and a field to print the error description and another to see which process processed the row last.

Link to comment
Share on other sites

Link to post
Share on other sites

I'm also not very familiar with horse racing, but these might help to give extra info to consider:

 

1. Are odds given against both the horse and a rider? If so, there's no provision for the rider yet.

2. Bookies would change odds. In a 10 horse race, the chance of horse1 coming first might be a lot different when the race is announced, to the day before the race if 5 or the horses become ill & can't race.

3. You could look at a bookies - odds linking table just having the responsibility of holding IDs to those 2 (other) tables. that way, if an odd (what's the singular of odds??) is the same for 2 bookies, you're not duplicating the data & having 2 rows in your odds table.

4. In the races table, your location could be perhaps an ID. I've no idea with racing, but could it be probable that a city might have several racetracks?

5. If you might want analytics or prediction capabilities in the future, it would be good to have all the details of the race captured. How many starters, who took part, what weather conditions were at play, did the race start on time, how many sugar cubes did the horse have before the race, was it whipped or hugged as it went around the track...

6. If the likelihood of this expansion & inclusion of other sports is already in your mind, I'd be tempted to have a SportType table already and maybe AnimalType already at this stage.

Link to comment
Share on other sites

Link to post
Share on other sites

  1. Relations (tables) should be named in the singular. A relation is to a row as a cookie cutter is to the cookie; the relation has attributes, each row is a concrete instance of that relation that fills in those attributes. Among other things this makes the naming scheme of foreign key relationships clearer.
  2. The Race relation needs to be normalized. As presumable multiple races will occur at a given location, you need to split out location into another table to avoid duplicate data.
  3. Given your current design, the Odds relation doesn't need an id. You can create a multi-part primary key from the Horse ID, Race ID, and Bookie ID as this should form a unique combination. As the Odds ID is not referenced in another table, it is wasted space.
  4. You should be storing a unique salt for use in hashing the user password. This hardens your authentication scheme against attacks such as a Rainbow Table.
Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, alex_read said:

I'm also not very familiar with horse racing, but these might help to give extra info to consider:

 

1. Are odds given against both the horse and a rider? If so, there's no provision for the rider yet.

2. Bookies would change odds. In a 10 horse race, the chance of horse1 coming first might be a lot different when the race is announced, to the day before the race if 5 or the horses become ill & can't race.

3. You could look at a bookies - odds linking table just having the responsibility of holding IDs to those 2 (other) tables. that way, if an odd (what's the singular of odds??) is the same for 2 bookies, you're not duplicating the data & having 2 rows in your odds table.

4. In the races table, your location could be perhaps an ID. I've no idea with racing, but could it be probable that a city might have several racetracks?

5. If you might want analytics or prediction capabilities in the future, it would be good to have all the details of the race captured. How many starters, who took part, what weather conditions were at play, did the race start on time, how many sugar cubes did the horse have before the race, was it whipped or hugged as it went around the track...

6. If the likelihood of this expansion & inclusion of other sports is already in your mind, I'd be tempted to have a SportType table already and maybe AnimalType already at this stage.


I don't know a whole lot about it myself, but I am doing this for a friend as a fun project. Plus, he will pay for the server and everything else.
 

1. Odds are only given for horses.... or maybe horse/rider combos. It's just one, anyway.
2. The odds will change, yes, but we only need to know the odds as they stand currently. Hence, I will have a cronjob that pulls in new races each day and updates the odds in the database for the current day's races every 30 minutes.

3. That is an interesting idea. Could save space. However, it would be complicated because as you say any bookie could change their odds at any time, and then the two bookies would no longer match...
4. Yeah, that's a good idea. I forgot to normalize the location.
5. Analytics and prediction isn't necessary. The way my friend bets uses heuristics, taking into account only the immediate standing with regards to how odds compare. Any past performance of a horse, weather conditions, timing etc. is not important because the assumption is that this information is all already integrated by the "Real" odds which are taken from a betting exchange (which is a place where rather than bet with bookies you bet with each other on a similar model to a stock/currency/commodity exchange).

6. You're right. I should generalise the odds table so it can accept any sport. I could make it so that rather than have a Race_ID, it has an Event_ID... and an Event can map to a race, a football match, a golf tournament etc. Though, I am not sure how to do that... presumably the Event table would have to have a foreign key field that could reference one of multiple tables? Is there anything in a relational database that can do that?

 

1 hour ago, SSL said:
  1. Relations (tables) should be named in the singular. A relation is to a row as a cookie cutter is to the cookie; the relation has attributes, each row is a concrete instance of that relation that fills in those attributes. Among other things this makes the naming scheme of foreign key relationships clearer.
  2. The Race relation needs to be normalized. As presumable multiple races will occur at a given location, you need to split out location into another table to avoid duplicate data.
  3. Given your current design, the Odds relation doesn't need an id. You can create a multi-part primary key from the Horse ID, Race ID, and Bookie ID as this should form a unique combination. As the Odds ID is not referenced in another table, it is wasted space.
  4. You should be storing a unique salt for use in hashing the user password. This hardens your authentication scheme against attacks such as a Rainbow Table.

 

Thanks for these. The table naming will be handled by the ORM that will generate the DB anyway, so it should not be a problem. Alex mentioned normalization too, I will do that. You're right that the Odds relation doesn't need the ID. I can get rid of that.

I will be using BCrypt for the password management (strictly speaking, Devise, which is a user account management framework for RoR) and I believe BCrypt stores the salt as part of the hash. My understanding is it has a little header at the beginning which gives you information like what hashing algorithm was used, and how many times it was applied (the cost) and then it has the salt, with the password hash then appended on the end.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×