Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

mysql join help

dwang040
 Share

 

So, I currently have a database where I have 3 tables, Employee, Schedule, and Shift (structure shown below). My goal is to write a select statement that will take the average amount earned per empType and multiply it by the total numbers of hours that the empTypes work. 

DESCRIBE EMPLOYEE;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| empId     | varchar(6)  | YES  |     | NULL    |       |
| lastName  | varchar(32) | YES  |     | NULL    |       |
| firstName | varchar(32) | YES  |     | NULL    |       |
| cellphone | varchar(14) | YES  |     | NULL    |       |
| homephone | varchar(14) | YES  |     | NULL    |       |
| empType   | varchar(6)  | YES  |     | NULL    |       |
| ftpt      | varchar(2)  | YES  |     | NULL    |       |
| pay       | varchar(7)  | YES  |     | NULL    |       |
+-----------+-------------+------+-----+---------+-------+
DESCRIBE SCHEDULE;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| scheduleId | varchar(6)  | YES  |     | NULL    |       |
| empId      | varchar(6)  | YES  |     | NULL    |       |
| date       | varchar(10) | YES  |     | NULL    |       |
| shiftId    | varchar(6)  | YES  |     | NULL    |       |
| dept       | varchar(32) | YES  |     | NULL    |       |
+------------+-------------+------+-----+---------+-------+
DESCRIBE SHIFT;
+-------------+------------+------+-----+---------+-------+
| Field       | Type       | Null | Key | Default | Extra |
+-------------+------------+------+-----+---------+-------+
| shiftId     | varchar(6) | YES  |     | NULL    |       |
| startTime   | varchar(4) | YES  |     | NULL    |       |
| shiftLength | varchar(2) | YES  |     | NULL    |       |
+-------------+------------+------+-----+---------+-------+

I tried this, thinking that by using a join, I could individually get the avg from EMPLOYEE and then get the sum from SCHEDULE and SHIFT, but for some reason, it's just not right. The avg pay per empType is still not correct. I've tried various types of joins but they all produce the same result.

SELECT t1.empType, FORMAT(AVG(REPLACE(t1.pay, "$", "")),2) AS "AVG PAY", SUM(t3.shiftLength)
FROM EMPLOYEE AS t1 JOIN SCHEDULE AS t2
ON t1.empId = t2.empId
JOIN SHIFT AS t3
ON t2.shiftId = t3.shiftId
GROUP BY empType;

RESULT AVG PAY is incorrect as you can see compared to the results posted below. The shiftLength is correct however. My guess is that the AVG PAY is
somehow the AVG based off of SCHEDULE and not based off of EMPLOYEE? But I thought that through joining them, I could get each values individually:
+---------+---------+---------------------+
| empType | AVG PAY | SUM(t3.shiftLength) |
+---------+---------+---------------------+
| LPN     | 31.59   |                1508 |
| NA      | 33.06   |                1288 |
| RN      | 29.71   |                1200 |
+---------+---------+---------------------+

This is supposed to be the correct numbers. 

+---------+---------+---------------------+
| empType | AVG PAY | SUM(t3.shiftLength) |
+---------+---------+---------------------+
| LPN     | 30.91   |                1508 |
| NA      | 33.92   |                1288 |
| RN      | 33.10   |                1200 |
+---------+---------+---------------------+

Thanks!

Link to comment
Share on other sites

Link to post
Share on other sites

Pay is a varchar. I believe this means that when taking an average, it will exclude any blank/null values rather than treat then as 0. Are there any blanks in that column?

I would suggest converting pay to an integer field, and depending on you currency store the lowest denomination. E.g. if in $, store the number of cent. so $10.45 = 1045
 

Link to comment
Share on other sites

Link to post
Share on other sites

MySQL has a double type as far as I'm aware...

 

@dwang040 Could you give us some sample data, so that we can get a better idea of where your SQL is going wrong?

75% of what I say is sarcastic

 

So is the rest probably

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, myselfolli said:

MySQL has a double type as far as I'm aware...

Yes but it is best not to use for currency.

 

This gives a better explanation than I can: https://stackoverflow.com/a/3730040/9863418

TL;DR: doubles & floats are represented as 2^n whereas currency is always 10^n (n may be negate in both cases) and so if you use a double or a float eventually you will have rounding error that you could have easily avoided.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Meic said:

Yes but it is best not to use for currency.

 

This gives a better explanation than I can: https://stackoverflow.com/a/3730040/9863418

TL;DR: doubles & floats are represented as 2^n whereas currency is always 10^n (n may be negate in both cases) and so if you use a double or a float eventually you will have rounding error that you could have easily avoided.

Makes sense...

75% of what I say is sarcastic

 

So is the rest probably

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Meic said:

Pay is a varchar. I believe this means that when taking an average, it will exclude any blank/null values rather than treat then as 0. Are there any blanks in that column?

I would suggest converting pay to an integer field, and depending on you currency store the lowest denomination. E.g. if in $, store the number of cent. so $10.45 = 1045
 

I used to think that too, but the REPLACE has actually solved it (unless I am mistaken and somehow the data just appears to work). my values used to return 0/ null when doing calculations, however, by removing the $, I believe it is able to convert "$10.45" to a varchar that's just "10.45", and by using AVG or SUM or even just doing (REPLACE(pay, "$", "") + 0) it is able to turn that from a varchar to a decimal/ something useable for calculations.

 

37 minutes ago, myselfolli said:

MySQL has a double type as far as I'm aware...

 

@dwang040 Could you give us some sample data, so that we can get a better idea of where your SQL is going wrong?

 Sure, here's the first 5 sample data from each table:

Table EMPLOYEE:
+--------+----------+-----------+----------------+----------------+---------+------+---------+
| empId  | lastName | firstName | cellphone      | homephone      | empType | ftpt | pay     |
+--------+----------+-----------+----------------+----------------+---------+------+---------+
| 919675 | Cermak   | Vern      | (138)7198-0862 | (655)2249-9926 | LPN     | FT   | $46.80  |
| 906704 | Paille   | Lorilee   | (858)1100-2722 | (377)1506-1986 | NA      | PT   | $47.26  |
| 900486 | Ober     | Shalanda  | (210)1508-8132 | (820)1612-0197 | NA      | FT   | $21.07  |
| 883367 | Clyburn  | Jeffry    | (420)7798-7220 | (476)7661-3070 | LPN     | FT   | $30.62  |
| 953807 | Quarles  | Timmy     | (513)2756-9892 | (562)9034-2889 | LPN     | FT   | $38.54  |
+--------+----------+-----------+----------------+----------------+---------+------+---------+

Table SCHEDULE:
+------------+--------+-----------+---------+------------+
| scheduleId | empId  | date      | shiftId | dept       |
+------------+--------+-----------+---------+------------+
| 1          | 943023 | 9/24/2018 | 3       | RADIOLOGY  |
| 2          | 891046 | 9/24/2018 | 1       | SURGERY    |
| 3          | 878397 | 9/24/2018 | 5       | RADIOLOGY  |
| 4          | 886190 | 9/24/2018 | 3       | CARDIOLOGY |
| 5          | 878397 | 9/24/2018 | 5       | RADIOLOGY  |
+------------+--------+-----------+---------+------------+

Table SHIFt:
+---------+-----------+-------------+
| shiftId | startTime | shiftLength |
+---------+-----------+-------------+
| 1       | 07AM      | 8           |
| 3       | 03PM      | 8           |
| 5       | 11PM      | 8           |
| 2       | 07AM      | 12          |
| 4       | 07PM      | 12          |
+---------+-----------+-------------+

 

Link to comment
Share on other sites

Link to post
Share on other sites

@dwang040 What happens if you break the SQL down and only try to calculate the averages:

SELECT t1.empType, FORMAT(AVG(REPLACE(t1.pay, "$", "")),2) GROUP BY empType;

Do you then get the correct results?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Meic said:

@dwang040 What happens if you break the SQL down and only try to calculate the averages:


SELECT t1.empType, FORMAT(AVG(REPLACE(t1.pay, "$", "")),2) GROUP BY empType;

Do you then get the correct results?

Not sure if this is what you meant but I will get the correct answer averages if I solve for pay unrelated to SCHEDULE and SHIFT. So as shown above, if I don't include the FROM, WHERE, and JOIN with SCHEDULE and SHIFT, then my output is correct. 

SELECT empType, FORMAT(AVG(REPLACE(pay, "$", "")), 2) AS "AVG PAY" FROM EMPLOYEE GROUP BY empType;
+---------+---------------------------------------+
| empType | AVG PAY |
+---------+---------------------------------------+
| LPN     | 30.91                                 |
| NA      | 33.92                                 |
| PHLEB   | 34.67                                 |
| RN      | 33.10                                 |
| ULTRA   | 27.68                                 |
| XRAY    | 35.90                                 |
+---------+---------------------------------------+
6 rows in set (0.00 sec)

mysql> SELECT t1.empType, FORMAT(AVG(REPLACE(t1.pay, "$", "")), 2) AS "AVG PAY" FROM EMPLOYEE AS t1 GROUP BY empType;
+---------+---------+
| empType | AVG PAY |
+---------+---------+
| LPN     | 30.91   |
| NA      | 33.92   |
| PHLEB   | 34.67   |
| RN      | 33.10   |
| ULTRA   | 27.68   |
| XRAY    | 35.90   |
+---------+---------+

 

Link to comment
Share on other sites

Link to post
Share on other sites

Alright, so a few things:

-there are more data types than varchar, please use them. In your specific example, at least int and date would be useful

-don't include the currency in the employees pay. You can either have another column, in which you save the currency, or (if they all get paid in the same currency), don't bother saving it in your database

-have a look at primary and foreign keys, to ensure referential integrity

 

As for your SQL:

As @Meic already figured out, since the average is correct when you don't include your joins, the calculation of your averages is obviously correct. Your joins are off.

 

Here's how I'd do it:

SELECT
	employee.empType AS empType,
    AVG(employee.pay) AS avgPay,
    SUM(shift.shiftLength) AS shiftLength
FROM
	employee,
    schedule,
    shift
WHERE
	employee.empId = schedule.empId
AND
	schedule.shiftId = shift.shiftId
GROUP BY employee.empType;

 

A few key differences:

-I modified some of your data types (pay is currently a float, which as @Meic already pointed out, may lead to problems)

-I dislike the "JOIN"-Keyword. I'd always opt for joining tables manually, I find them to be more predictable

 

With all this in mind, I did actually run the SQL you see above, it worked for me...

 

See the attatched sql for the data I worked with:

test.sql

75% of what I say is sarcastic

 

So is the rest probably

Link to comment
Share on other sites

Link to post
Share on other sites

45 minutes ago, myselfolli said:

Alright, so a few things:

-there are more data types than varchar, please use them. In your specific example, at least int and date would be useful

-don't include the currency in the employees pay. You can either have another column, in which you save the currency, or (if they all get paid in the same currency), don't bother saving it in your database

-have a look at primary and foreign keys, to ensure referential integrity

 

As for your SQL:

As @Meic already figured out, since the average is correct when you don't include your joins, the calculation of your averages is obviously correct. Your joins are off.

 

ere's how I'd do it:


SELECT
	employee.empType AS empType,
    AVG(employee.pay) AS avgPay,
    SUM(shift.shiftLength) AS shiftLength
FROM
	employee,
    schedule,
    shift
WHERE
	employee.empId = schedule.empId
AND
	schedule.shiftId = shift.shiftId
GROUP BY employee.empType;

 

1

Damn, if the code worked for you, I guess it must be (as you guys said) a floating point error then. I remember trying something similar to what you did earlier but again, for some reason, it'll work as long as no tables are related to each other through joins of any sort. 

SELECT EMPLOYEE.empType AS empType, AVG(REPLACE(EMPLOYEE.pay, "$", "")) AS avgPay, SUM(SHIFT.shiftLength) AS "shiftLength"
    -> FROM EMPLOYEE, SCHEDULE, SHIFT
    -> WHERE EMPLOYEE.empId = SCHEDULE.empId AND SCHEDULE.shiftId = SHIFT.shiftId
    -> GROUP BY EMPLOYEE.empType;
+---------+------------------+-------------+
| empType | avgPay           | shiftLength |
+---------+------------------+-------------+
| LPN     | 31.5889634146341 |        1508 |
| NA      | 33.0552112676056 |        1288 |
| RN      | 29.7053435114504 |        1200 |
+---------+------------------+-------------+
3 rows in set (0.05 sec)

Well, if that really is the case, I guess it would be a good idea and a good time to change all my var types and didn't add PKs and FKs. The csv we got for this project were all strings and being a bit lazy, I just shoved everything into a varchar. It's the same reason why pay was put under the employee table, because I don't have a fix number of pay values and the random pay values were given as part of the employee csv. Can definitely give the modified values a try and see if that helps. 

 

UPDATE: Changed the data types, added pk and fk. Still no good. I didn't mention this clearly enough, but when I say that the avg pay is based off of schedule vs employee, I guess I should clarify in case there's any confusion. Let's say I have 50 employees working, and they all have a pay. The avg separate would be getting sum/ 50. However, not all 50 of those employees are actually working in the schedule. So, while my avg is returning 31.58896.... if I were to add another employee to the schedule who wasn't on it previously, my avg would change to 31.696969...

INSERT INTO SCHEDULE(scheduleId, empId, `date`, shiftId, dept) VALUES ("438", "995731", "2018-10-08", "3", "EMERGENCY");

mysql> SELECT AVG(EMPLOYEE.pay) FROM EMPLOYEE, SCHEDULE, SHIFT WHERE EMPLOYEE.empId = SCHEDULE.empId AND SCHEDULE.shiftId = SHIFT.shiftId AND empType = "LPN" GROUP BY empType;
+-------------------+
| AVG(EMPLOYEE.pay) |
+-------------------+
|         31.636363 |
+-------------------+

INSERT INTO SCHEDULE(scheduleId, empId, `date`, shiftId, dept) VALUES ("439", "995729", "2018-10-08", "3", "EMERGENCY");

mysql> SELECT AVG(EMPLOYEE.pay) FROM EMPLOYEE, SCHEDULE, SHIFT WHERE EMPLOYEE.empId = SCHEDULE.empId AND SCHEDULE.shiftId = SHIFT.shiftId AND empType = "LPN" GROUP BY empType;
+-------------------+
| AVG(EMPLOYEE.pay) |
+-------------------+
|         31.536325 |
+-------------------+

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share


×