Jump to content

MySQL join sequence confusion

Go to solution Solved by Franck,
35 minutes ago, IncrediblePony said:

 

For full disclosure I come up with this query. It's ugly AF and it runs, but it takes about two to three minutes to complete


SELECT p.project_id, p.hourly_rate, ex.sum_elapsed
FROM projects p
  JOIN (
      SELECT i.project_id, SUM(iv.seconds_elapsed) AS sum_elapsed
      FROM issues i
      LEFT JOIN intervals iv ON i.issue_id = iv.issue_id AND (i.internal_note NOT LIKE '%SKIF%') AND (iv.dt_start BETWEEN "2019-03-01T00:00:00Z" AND "2019-03-08T23:59:59Z")
      GROUP BY i.project_id
    ) AS ex ON p.project_id = ex.project_id AND p.project_name NOT LIKE 'K - %';

I haven't yet compared it to your work, but I'm almost certain that these are somewhat similar. I have 82 possible projects, 53.094 issues and 69.972 time intervals as of last friday. So some sort of handling has to be done in order for it to run in a timely fashion.

 

Don't add the condition filter on the LEFT join. This cause to match the records AND apply the filter at the same time. Put a standard WHERE instead which will prefilter the records and cut a ton load of iterations.

 

so the last line instead of 

  ) AS ex ON p.project_id = ex.project_id AND p.project_name NOT LIKE 'K - %';

do this

  ) AS ex ON p.project_id = ex.project_id WHERE p.project_name NOT LIKE 'K - %';

 

Hello!

 

I have a question regarding some tables I want to join to retrieve information about some time spent on different projects. I have three tables:

projects p
COLS: | id(PK) | project_id | project_name | deleted | hourly_rate |

issues i
COLS: | id(PK) | issue_id | issue_name | is_open | project_id | internal_note |

intervals iv
COLS: | id(PK) | manuscript_person_id | issue_id | interval_id | seconds_elapsed |

Now - I need to join these three together so I get total elapsed seconds in a project based on the issues in the given projects.

 

The the `p.project_name` must not start with the prefix "K - ".

The `i.internal_note` must not contain the letters in the following order "SKIF"

 

I have written a query that fetches all projects that doesn't contain the prefix "K - ":

SELECT p.project_id, p.hourly_rate
FROM projects p WHERE p.project_name NOT LIKE "K - %"

And I have also written a query that finds all issues in those projects that does not have the "K - " prefix and internal_note does not contain "SKIF"

SELECT i.issue_id, i.internal_note FROM issues i
JOIN projects p ON i.project_id = p.project_id
WHERE project_name NOT LIKE "K - %" AND i.internal_note NOT LIKE "%SKIF%"

 

My problem now lies in tying these together with the summed up elapsed seconds from intervals grouped first by i.issue_id and then by p.project_id in the end so I will end up with a result that looks like so:

    | p.project_id | p.hourly_rate | some_join.elapsed_seconds |
----|--------------|---------------|---------------------------|
1   | 836          | 900           | 367929                    |
----|--------------|---------------|---------------------------|
2   | 928          | 950           | 367929                    |
----|--------------|---------------|---------------------------|
3   | 1125         | 942           | 367929                    |
----|--------------|---------------|---------------------------|
4   | 1337         | 827           | 367929                    |
----------------------------------------------------------------
etc.....

How would one suggest to attack this problem?

A simple software developer from the far away land of Denmark

Link to comment
Share on other sites

Link to post
Share on other sites

Join is a Cartesian product of two relations  minus their duplicated shared attributes. You can use a nested queries to retrieve the Cartesian products of the tupples relevant to your queries and then do use the select operator to narrow down the attributes you desire in your relations. If you use the natural join, you no longer have to specify which foreign keys should be equivalent because natural join automatically does that for you. 

Sudo make me a sandwich 

Link to comment
Share on other sites

Link to post
Share on other sites

You can technically cheat by summing values per records of the second query

 

SELECT i.issue_id, i.internal_note, (SELECT TOP 1 SUM(seconds_elapsed) FROM intervals WHERE intervals.issue_id = i.issue_id) as 'Total Elapsed Seconds' FROM issues i
JOIN projects p ON i.project_id = p.project_id
WHERE project_name NOT LIKE "K - %" AND i.internal_note NOT LIKE "%SKIF%"

If you look this simply added a column that is the result of a select on the third table based on the current iteration of the main query. I don't have a MySQL ready on hand right now it's my day off but that would be a simple quick and dirty version to grab more of the data you need.

 

But overall you wouldn't want that query as your final result. that's not what your output is based on, What you result show is you want a list of project first, then sums of other data so it should be more like :

 

SELECT 
	projects.project_id, 
	projects.hourly_rate, 
	(
		SELECT SUM(seconds_elapsed) 
		FROM intervals 
		LEFT JOIN issues ON intervals.issue_id = issues.issue_id 
		WHERE issues.internal_note NOT LIKE '%SKIF%' AND
		      issues.project_id = projects.project_id
	) as 'elapsed_seconds'
FROM projects WHERE projects.project_name NOT LIKE 'K - %'

 Not 100% it runs, again i don't have access to SQL right now. But the logic is good or at least very close.

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/8/2019 at 7:58 PM, Franck said:

You can technically cheat by summing values per records of the second query

 


SELECT i.issue_id, i.internal_note, (SELECT TOP 1 SUM(seconds_elapsed) FROM intervals WHERE intervals.issue_id = i.issue_id) as 'Total Elapsed Seconds' FROM issues i
JOIN projects p ON i.project_id = p.project_id
WHERE project_name NOT LIKE "K - %" AND i.internal_note NOT LIKE "%SKIF%"

If you look this simply added a column that is the result of a select on the third table based on the current iteration of the main query. I don't have a MySQL ready on hand right now it's my day off but that would be a simple quick and dirty version to grab more of the data you need.

 

But overall you wouldn't want that query as your final result. that's not what your output is based on, What you result show is you want a list of project first, then sums of other data so it should be more like :

 


SELECT 
	projects.project_id, 
	projects.hourly_rate, 
	(
		SELECT SUM(seconds_elapsed) 
		FROM intervals 
		LEFT JOIN issues ON intervals.issue_id = issues.issue_id 
		WHERE issues.internal_note NOT LIKE '%SKIF%' AND
		      issues.project_id = projects.project_id
	) as 'elapsed_seconds'
FROM projects WHERE projects.project_name NOT LIKE 'K - %'

 Not 100% it runs, again i don't have access to SQL right now. But the logic is good or at least very close.

 

For full disclosure I come up with this query. It's ugly AF and it runs, but it takes about two to three minutes to complete

SELECT p.project_id, p.hourly_rate, ex.sum_elapsed
FROM projects p
  JOIN (
      SELECT i.project_id, SUM(iv.seconds_elapsed) AS sum_elapsed
      FROM issues i
      LEFT JOIN intervals iv ON i.issue_id = iv.issue_id AND (i.internal_note NOT LIKE '%SKIF%') AND (iv.dt_start BETWEEN "2019-03-01T00:00:00Z" AND "2019-03-08T23:59:59Z")
      GROUP BY i.project_id
    ) AS ex ON p.project_id = ex.project_id AND p.project_name NOT LIKE 'K - %';

I haven't yet compared it to your work, but I'm almost certain that these are somewhat similar. I have 82 possible projects, 53.094 issues and 69.972 time intervals as of last friday. So some sort of handling has to be done in order for it to run in a timely fashion.

A simple software developer from the far away land of Denmark

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, IncrediblePony said:

 

For full disclosure I come up with this query. It's ugly AF and it runs, but it takes about two to three minutes to complete


SELECT p.project_id, p.hourly_rate, ex.sum_elapsed
FROM projects p
  JOIN (
      SELECT i.project_id, SUM(iv.seconds_elapsed) AS sum_elapsed
      FROM issues i
      LEFT JOIN intervals iv ON i.issue_id = iv.issue_id AND (i.internal_note NOT LIKE '%SKIF%') AND (iv.dt_start BETWEEN "2019-03-01T00:00:00Z" AND "2019-03-08T23:59:59Z")
      GROUP BY i.project_id
    ) AS ex ON p.project_id = ex.project_id AND p.project_name NOT LIKE 'K - %';

I haven't yet compared it to your work, but I'm almost certain that these are somewhat similar. I have 82 possible projects, 53.094 issues and 69.972 time intervals as of last friday. So some sort of handling has to be done in order for it to run in a timely fashion.

 

Don't add the condition filter on the LEFT join. This cause to match the records AND apply the filter at the same time. Put a standard WHERE instead which will prefilter the records and cut a ton load of iterations.

 

so the last line instead of 

  ) AS ex ON p.project_id = ex.project_id AND p.project_name NOT LIKE 'K - %';

do this

  ) AS ex ON p.project_id = ex.project_id WHERE p.project_name NOT LIKE 'K - %';

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/12/2019 at 12:49 PM, Franck said:

 

Don't add the condition filter on the LEFT join. This cause to match the records AND apply the filter at the same time. Put a standard WHERE instead which will prefilter the records and cut a ton load of iterations.

 

so the last line instead of 


  ) AS ex ON p.project_id = ex.project_id AND p.project_name NOT LIKE 'K - %';

do this


  ) AS ex ON p.project_id = ex.project_id WHERE p.project_name NOT LIKE 'K - %';

 

NICE! Well spotted!

A simple software developer from the far away land of Denmark

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/8/2019 at 6:38 PM, wasab said:

Join is a Cartesian product of two relations  minus their duplicated shared attributes. You can use a nested queries to retrieve the Cartesian products of the tupples relevant to your queries and then do use the select operator to narrow down the attributes you desire in your relations. If you use the natural join, you no longer have to specify which foreign keys should be equivalent because natural join automatically does that for you. 

I wish I followed relational algebra more closely at uni. Would have made this problem so much easier. Thank you for this!

A simple software developer from the far away land of Denmark

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×