Jump to content

SQL Query Where Assistance Required

Fallen Soul
Go to solution Solved by Fallen Soul,
1 hour ago, Eigenvektor said:

You should really think about normalizing your database, as @minibois said.

 

You're doing things in your database that have no business being there. It'll make your queries a lot slower and require tons of workarounds in the end. How you present parents (e.g. "and" between them, female first, male second, etc.) is presentation logic. That has no business being in your data layer, since requirements for this can change in the future.

 

Have a table for persons, then have a separate table for relationships between them.

{ person_id, first_name, last_name, … }

{ relation_id, child_id, parent_id }

 

You could also add a relationship status. For example, what about people who have natural parents and foster parents? How would you present those in your combined column? This makes queries very awkward as you already discovered, require multiple queries and then additional processing in your code when you have a result.

 

With normalized data, querying e.g. only for the mother would become a lot easier.


select …
  from person child
  join relation r on r.child_id = child.person_id
  join person mother on mother.person_id = r.parent_id and mother.sex = 'f'

 

I understand, however that would be out of my hands. On top of that, the database belongs to the software TASS and we don't plan to go around modifying it. This is the only case at this stage where we are inserting data from the back end. Once this is done, it's up to HR to ensure, the data is put in correctly and we will most likely generate a report weekly to ensure that they are. 

I have ended up getting it to do what we want now anyway. Just waiting the go ahead to actually run it. 


Either way, Thank you for the info i will look back at this maybe at a later date if need be. 

Also thanks to @Minibois for all the help you have provided. 

Hi, 

I am in need to find out which accounts in our DB has 3 or more emails attached to them. 
The column names in the table being used are both e_mail and e_mail2

Currently i have a where clause put in place that looks for the delimiter ; but this will find all with multiple emails. 

How can i only show ones with email counts of 3 and above? 

The systems DB is TASS , not that this will be of any use but thought i would state it just encase some one else is using it. 

Thanks for any help in advanced. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Can you give a bit more information, like a (rough) drawing of the db?

Where are the email adresses stored, compared to the user accounts? Did you describe one table, which contains the user account info, e_mail and e_mail2?

How would you figure out if an account had three e-mails attached to it? Typically there are two tables, one with email-addresses and another with account info and these would be linked.

"We're all in this together, might as well be friends" Tom, Toonami.

 

mini eLiXiVy: my open source 65% mechanical PCB, a build log, PCB anatomy and discussing open source licenses: https://linustechtips.com/topic/1366493-elixivy-a-65-mechanical-keyboard-build-log-pcb-anatomy-and-how-i-open-sourced-this-project/

 

mini_cardboard: a 4% keyboard build log and how keyboards workhttps://linustechtips.com/topic/1328547-mini_cardboard-a-4-keyboard-build-log-and-how-keyboards-work/

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, minibois said:

Can you give a bit more information, like a (rough) drawing of the db?

Where are the email adresses stored, compared to the user accounts? Did you describe one table, which contains the user account info, e_mail and e_mail2?

How would you figure out if an account had three e-mails attached to it? Typically there are two tables, one with email-addresses and another with account info and these would be linked.

There are multiple tables depending if its an existing 'User' or an 'enrolling user' . The user table and enrolling users both have another table that stores address information called i think. paraddr (user address) and enrparaddr (enrolling users)

 

either way the information i am focusing on is only stored in the the two columns e_mail and e_mail2 which are in tables enrparaddr and paraddr.

I have already got the select xxx from xx and join xxx on xx done up, I am just trying to work out, how to filter the info in the where clause to only show 3 and more email address linked to them. 

So for what ever HR has added three emails into the one field (e_mail) and it will looks something like:   johndoe@email.com; janedoe@email.com; freddoe@email.com

 

Please note i am a complete noob at sql or any scripting to be exact 

P.S. I will post my current code up when i get to work in about an hour. 

 

 

 

Current Script: 
 

select pa.par_code,

    pa.par_name AS 'name',

    CASE
        WHEN pa.e_mail IS NULL
            THEN pa.e_mail2
        ELSE pa.e_mail
    END AS email

--p.status_flg,

--pa.add_num

from parent as p 

    join paraddress as pa
        on p.par_code = pa.par_code
            and p.cmpy_code = pa.cmpy_code

where status_flg = 'c'
    AND pa.add_num = 1
    AND pa.e_mail LIKE '%;%'

union all 

SELECT DISTINCT 
    ep.par_code, 
    ep.salutation AS 'name',
    Case
        when ep.e_mail IS NULL
            then ep.e_mail2
        else ep.e_mail
    END as 'email'

FROM studenrol AS se


    JOIN enrparaddr AS ep
        on se.par_code = ep.par_code
        and se.cmpy_code = ep.cmpy_code

    JOIN parenrol AS pe
        on ep.par_code = pe.par_code

    JOIN family AS f
        on se.par_code = f.entity_code

LEFT OUTER JOIN paraddress as pa
    ON se.par_code = pa.par_code
    AND pa.add_num = 1


WHERE se.entry_yr = YEAR(GETDATE())+1 
    and se.canc_flg <> 'Y'
    and se.place_offered = 'Y'
    and ep.add_num = 1
    and se.cmpy_code = '01'
    AND ep.e_mail LIKE '%;%'

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, abass44 said:

So for what ever HR has added three emails into the one field (e_mail) and it will looks something like:   johndoe@email.com; janedoe@email.com; freddoe@email.com

This is called a "multi-valued attribute" and it's one sins in most database principles. A cell should only contain one piece of information, not multiple.

This is how a database should be set up (or really one of the example of how to) for how to setup a database where users can have multiple e-mail adresses:

image.png.32d59a88353a71360bd549c27a001c14.png

 

Getting rid of multi-valued attributes is step one in normalization of a database and makes sure you can effectively and easily retrieve the needed information.

That way you can just do this:

SELECT students.name, COUNT(*) AS amount_of_emails
FROM email
JOIN students ON students.id = email.id
GROUP BY(email.id)
HAVING COUNT(students.id) >= 3

(this can probably be made with a subquery too, I didn't have the brains to make that right now though)

 

So that way you have this DB:

image.png.16fab2913f0f0bd9efcddfd5b8e645ab.png

 

And the above query gives me this result:

image.png.4d03d5a8c11109f37f9dbf46b5305e8b.png

But that would require changing the DB, likely having to make a bunch of import scripts.. Just showing you how this should be done

 

 

Aaaaanyways, this is the code you're looking for:

SELECT id, name
FROM studentsEMAILS
WHERE (CHAR_LENGTH(emailsAddresses) - CHAR_LENGTH(REPLACE(emailsAddresses, ';', '')) + 1) >= 3;

Replace "emailsAddresses" with whatever cell you're reading the string of email addresses from (assuming there is one column of e-mail adresses)

 

This just looks the full "emailAddresses" cell, i.e.:

Quote

John@gmail.com;John@hotmail.com;John@yahoo.com

Then it takes this full string again and gets rid of the semicolons (replacing them with nothing):

Quote

John@gmail.comJohn@hotmail.comJohn@yahoo.com

And substracts the first string, by the second one, leaving only the two semicolons inbetween the email addresses. That is why there is a "+1" at the end, to account for the email at the end that doesn't have a semicolon behind it. Than with a simply where statement it just look at the remaining number, to see if it is equal or above 3.

"We're all in this together, might as well be friends" Tom, Toonami.

 

mini eLiXiVy: my open source 65% mechanical PCB, a build log, PCB anatomy and discussing open source licenses: https://linustechtips.com/topic/1366493-elixivy-a-65-mechanical-keyboard-build-log-pcb-anatomy-and-how-i-open-sourced-this-project/

 

mini_cardboard: a 4% keyboard build log and how keyboards workhttps://linustechtips.com/topic/1328547-mini_cardboard-a-4-keyboard-build-log-and-how-keyboards-work/

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, minibois said:

This is called a "multi-valued attribute" and it's one sins in most database principles. A cell should only contain one piece of information, not multiple.

This is how a database should be set up (or really one of the example of how to) for how to setup a database where users can have multiple e-mail adresses:

image.png.32d59a88353a71360bd549c27a001c14.png

 

Getting rid of multi-valued attributes is step one in normalization of a database and makes sure you can effectively and easily retrieve the needed information.

That way you can just do this:


SELECT students.name, COUNT(*) AS amount_of_emails
FROM email
JOIN students ON students.id = email.id
GROUP BY(email.id)
HAVING COUNT(students.id) >= 3

(this can probably be made with a subquery too, I didn't have the brains to make that right now though)

 

So that way you have this DB:

image.png.16fab2913f0f0bd9efcddfd5b8e645ab.png

 

And the above query gives me this result:

image.png.4d03d5a8c11109f37f9dbf46b5305e8b.png

But that would require changing the DB, likely having to make a bunch of import scripts.. Just showing you how this should be done

 

 

Aaaaanyways, this is the code you're looking for:


SELECT id, name
FROM studentsEMAILS
WHERE (CHAR_LENGTH(emailsAddresses) - CHAR_LENGTH(REPLACE(emailsAddresses, ';', '')) + 1) >= 3;

Replace "emailsAddresses" with whatever cell you're reading the string of email addresses from (assuming there is one column of e-mail adresses)

 

This just looks the full "emailAddresses" cell, i.e.:

Then it takes this full string again and gets rid of the semicolons (replacing them with nothing):

And substracts the first string, by the second one, leaving only the two semicolons inbetween the email addresses. That is why there is a "+1" at the end, to account for the email at the end that doesn't have a semicolon behind it. Than with a simply where statement it just look at the remaining number, to see if it is equal or above 3.

I am getting the following error with that line.
image.png.35bde3072f4d360ba2d5ae5f9e45b8fc.png

 

Just encase, i am using MS SQL and not MySQL for this DB

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, abass44 said:

I am getting the following error with that line.
image.png.35bde3072f4d360ba2d5ae5f9e45b8fc.png

Just encase, i am using MS SQL and not MySQL for this DB

Looks like the MSSQL equivalent to (MySQL's) "CHAR_LENGTH" is "LEN".

Try replacing both instances of "CHAR_LENGTH" with "LEN".

"We're all in this together, might as well be friends" Tom, Toonami.

 

mini eLiXiVy: my open source 65% mechanical PCB, a build log, PCB anatomy and discussing open source licenses: https://linustechtips.com/topic/1366493-elixivy-a-65-mechanical-keyboard-build-log-pcb-anatomy-and-how-i-open-sourced-this-project/

 

mini_cardboard: a 4% keyboard build log and how keyboards workhttps://linustechtips.com/topic/1328547-mini_cardboard-a-4-keyboard-build-log-and-how-keyboards-work/

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, minibois said:

Looks like the MSSQL equivalent to (MySQL's) "CHAR_LENGTH" is "LEN".

Try replacing both instances of "CHAR_LENGTH" with "LEN".

Just found the equivalent after replying: 

 

and (LEN(ep.e_mail) - LEN(REPLACE(ep.e_mail, ';', '')) + 1) >= 3

Looks like it's done the trick. 

Now to send this off to HR and get them to verify the correct info and then update the tables to put the 2nd email into e_mail2

Thanks for you help. 

I am am sure i will be back soon asking more questions. 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, abass44 said:

Now to send this off to HR and get them to verify the correct info and then update the tables to put the 2nd email into e_mail2

In the name of future expandability, I do have to recommend using a separate table for e-mails, rather than using an "email2" column, or a semicolon to place multiple e-mails in a single cell. 

A separate table really is the way to go in my opinion, but it's up to you and your judgement.

"We're all in this together, might as well be friends" Tom, Toonami.

 

mini eLiXiVy: my open source 65% mechanical PCB, a build log, PCB anatomy and discussing open source licenses: https://linustechtips.com/topic/1366493-elixivy-a-65-mechanical-keyboard-build-log-pcb-anatomy-and-how-i-open-sourced-this-project/

 

mini_cardboard: a 4% keyboard build log and how keyboards workhttps://linustechtips.com/topic/1328547-mini_cardboard-a-4-keyboard-build-log-and-how-keyboards-work/

Link to comment
Share on other sites

Link to post
Share on other sites

Okay @minibois, I have an example below for the next step:image.png.087f5c9773a902f6f5613fd65c9e0a8c.png


Red is what current data looks like for people in the system and Then green is what it should look like. 
The blue column "add_num" the vale for Mother(4), Father(5) and Postal(1). We can ignore postal for now, but i will need to look at this once The mother and Father data has been resolved. 

So basically i need to split the two email address from the "e_mail" field and move the second email into "e_mail2" field. 
If you look at postal (1) data for the parents, and they have two emails as well, i will need to have new entries created for mother and father if not already existing. 

How can i go about doing this?

After finishing off the original question i had spent the remainder of the day trying to work out this part with no success, only part i worked out was how to update current info and not insert new info. 

Current code: 
 

MERGE paraddress AS pa 
USING 
	(SELECT 
		par_code,
		SUBSTRING (e_mail ,0,CHARINDEX(';',e_mail)) as e1,
		SUBSTRING(e_mail ,CHARINDEX(';',e_mail)+1,LEN(e_mail)) AS e2  
	FROM paraddress) 
	AS p  
ON pa.par_code = p.par_code
WHEN MATCHED THEN
UPDATE SET 
pa.e_mail = p.e1,
pa.e_mail2 = p.e2;

 

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, abass44 said:

However, the part i am stuck on is, if there is a entry from the data who is assigned add_num '1' and has data in e_mail and e_mail2 field but has not been assigned another entry with the column entry of 4 or 5 i need to create it with the same information coming for the date in the data list that's assigned add_num '1'. 

This is extremely confusing to even try to explain it, so i can't imagine you will get it from reading this on the first go. Either when i get to work tomorrow or before then, i will try and get an example done up so you have a better understanding. 

Ehh, I didn't really understand what you meant from this example, so not sure what to tell you.

What I would recommend in these sorts of cases, where you have multiple things to check if they are true/false is make a truth table.


Just really simply just have a variable to see if "add_num" is 1 or anything else and have another variable to see if... well whatever you need to check in e_mail and e_mail2.

That way you make it a lot more simple for yourself to see in which cases you would need to do something with the data entry or not.

"We're all in this together, might as well be friends" Tom, Toonami.

 

mini eLiXiVy: my open source 65% mechanical PCB, a build log, PCB anatomy and discussing open source licenses: https://linustechtips.com/topic/1366493-elixivy-a-65-mechanical-keyboard-build-log-pcb-anatomy-and-how-i-open-sourced-this-project/

 

mini_cardboard: a 4% keyboard build log and how keyboards workhttps://linustechtips.com/topic/1328547-mini_cardboard-a-4-keyboard-build-log-and-how-keyboards-work/

Link to comment
Share on other sites

Link to post
Share on other sites

31 minutes ago, minibois said:

Ehh, I didn't really understand what you meant from this example, so not sure what to tell you.

What I would recommend in these sorts of cases, where you have multiple things to check if they are true/false is make a truth table.


Just really simply just have a variable to see if "add_num" is 1 or anything else and have another variable to see if... well whatever you need to check in e_mail and e_mail2.

That way you make it a lot more simple for yourself to see in which cases you would need to do something with the data entry or not.

Yeah i thought that might have been the case, i am not crash host at explaining stuff as it is and they system seems blah as it is. Ideally, for each parent we would want to create a new parent code, but HR doesn't want the extra work -_- but who does right. 

Any way, would you have an example of what this truth table would look like? ..... if you have time that is. 

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, abass44 said:

Yeah i thought that might have been the case, i am not crash host at explaining stuff as it is and they system seems blah as it is. Ideally, for each parent we would want to create a new parent code, but HR doesn't want the extra work -_- but who does right. 

Any way, would you have an example of what this truth table would look like? ..... if you have time that is. 

I didn't fully understand the conditions and outcomes you're dealing with, so I wasn't sure what the conditions in the truth table would be.

The way a truth table works, is you have a statement you can answer yes/no to (i.e. "is the add_num value 1?") and then write out the different outcomes, based on this statement. You can add multiple statements, but you only have two outcomes, so it does depend on what the outcome(s) need to be.

 

 

The different truth tables I could come up with here, always have three outcomes. For example, when looking at:

"Does the user have two addressees in e_mail?"
"Is e_mail2 empty?"

 

There are four outcomes here. Both true, both false, or false-true or true-false respectively.

You just need to work out a truth table where you have outcomes, that come to two conclusions (do something with the data vs. do nothing). That way you have some data to actually make an if statement with.

 

"We're all in this together, might as well be friends" Tom, Toonami.

 

mini eLiXiVy: my open source 65% mechanical PCB, a build log, PCB anatomy and discussing open source licenses: https://linustechtips.com/topic/1366493-elixivy-a-65-mechanical-keyboard-build-log-pcb-anatomy-and-how-i-open-sourced-this-project/

 

mini_cardboard: a 4% keyboard build log and how keyboards workhttps://linustechtips.com/topic/1328547-mini_cardboard-a-4-keyboard-build-log-and-how-keyboards-work/

Link to comment
Share on other sites

Link to post
Share on other sites

image.png.519c48f485941f102ee41283a14ccdf7.pngimage.png.e13e58522e6f49dae6968203fe94db95.pngimage.png.519c48f485941f102ee41283a14ccdf7.png
Below is a image of the form that HR uses for parents in the database. 
image.png.519c48f485941f102ee41283a14ccdf7.png

As you can see the name Postal has the Number 1 next to it. This matches up with the column in the database called "add_num" 

(ignore the email fields in the image)

image.png.d761982a72268b2c10e6b2077b499628.png
 

Now the part i need to focus on will be the Mother (4) and Father (5) section of the form.
What i need to do, is go through the DB and check to see if the PAR_CODE has a entry for the Mother (4) and Father (5) and if they do, split the email into the correct email field according to the email address type. Email1 (e_mail) will be used for personal email and Email 2 (email_2) will be used for their business email address. 
As shown for the mother in this example below:

image.png.f62e3187519f6414edec5cc922e55b12.png

Database view: 
image.png.e13e58522e6f49dae6968203fe94db95.png

 

 

The part i guess i will say i am stuck on is, if there is not an entry for this individual with the par_code (Parent Code) .

i need to have it check to see if there is an entry by matching the par_code(parent code) and add_num value of 4 (mother) or 5 (father)  and if it doesn't find an entry with the par_code that has a add_num column of 4 or 5 to proceed and create a new entry. Taking the email from e_mail and e_mail2 from the postal (1) section of the form. 

Please ask any other questions. 

I know this task is way out of my skill level and the boss should know this but he isn't and there for he has already stated he will be upset if i don't get this done by the end of this week. 
 

image.png

image.png

image.png

image.png

image.png

image.png

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, abass44 said:

image.png.519c48f485941f102ee41283a14ccdf7.pngimage.png.e13e58522e6f49dae6968203fe94db95.pngimage.png.519c48f485941f102ee41283a14ccdf7.png
Below is a image of the form that HR uses for parents in the database. 
image.png.519c48f485941f102ee41283a14ccdf7.png

As you can see the name Postal has the Number 1 next to it. This matches up with the column in the database called "add_num" 

(ignore the email fields in the image)

image.png.d761982a72268b2c10e6b2077b499628.png
 

Now the part i need to focus on will be the Mother (4) and Father (5) section of the form.
What i need to do, is go through the DB and check to see if the PAR_CODE has a entry for the Mother (4) and Father (5) and if they do, split the email into the correct email field according to the email address type. Email1 (e_mail) will be used for personal email and Email 2 (email_2) will be used for their business email address. 
As shown for the mother in this example below:

image.png.f62e3187519f6414edec5cc922e55b12.png

Database view: 
image.png.e13e58522e6f49dae6968203fe94db95.png

 

 

The part i guess i will say i am stuck on is, if there is not an entry for this individual with the par_code (Parent Code) .

i need to have it check to see if there is an entry by matching the par_code(parent code) and add_num value of 4 (mother) or 5 (father)  and if it doesn't find an entry with the par_code that has a add_num column of 4 or 5 to proceed and create a new entry. Taking the email from e_mail and e_mail2 from the postal (1) section of the form. 

Please ask any other questions. 

I know this task is way out of my skill level and the boss should know this but he isn't and there for he has already stated he will be upset if i don't get this done by the end of this week. 
 

image.png

image.png

image.png

image.png

image.png

image.png

@miniboisSo all this rubbish i just said apparently is irrelevant. 

All i need to do is create two new entries in the table for each par_code by using the email from email1 and email2. 

Email 1 will be the mothers account that will be assigned the number 4 in the add num column and the Father will be using the email 2 and assigned the number 5 in the add_num column. 

Does this make sense?

The other thing i know you will be able to help me work out which will be similar to the email field is how to split the two peoples names from the column "par_name"
This will generally look like this: 
image.png.063684297a4e47def2b5e2ef21e0a31f.png

 

However not all entries will look that clean.  some might have the Mr XXX first or just Mr and Mrs XXX. 
The data is very inconstant which we will be trying to clean up over time. 

 

So far my code looks like this, but will need to split the names for Par_code as stated above. Alternatively, i would like it to not create an entry if it already has a parent_code with the add_num field of 4 and 5.


UPDATE: Okay i was able to work out the name part. Is it possible to put this into one query or do i need to keep them as two seperate queries? 
Also still would like it to check if the data exist and if not then insert into. 

---- MOTHER DATA--
INSERT INTO paraddress
SELECT cmpy_code
      ,par_code
      ,add_num = 4
      ,SUBSTRING (par_name ,0,CHARINDEX(' AND ',par_name)) AS par_name
      ,par_name2
      ,addr1
      ,addr2
      ,addr3
      ,town_sub
      ,state_code
      ,post_code
      ,country
      ,dpid_text
      ,home_phone
      ,bus_phone
      ,fax
      ,salutation
	  ,SUBSTRING (e_mail ,0,CHARINDEX(';',e_mail)) as e_mail
      ,link_address
      ,link_phone
      ,link_e_mail
      ,old_mobile
      ,mobile1
      ,sms_flg1
      ,link_mobile1
      ,mobile2
      ,sms_flg2
      ,link_mobile2
      ,dpid
      ,latitude
      ,longitude
      ,unit_type
      ,unit_nbr
      ,floor_type
      ,floor_nbr
      ,lot_nbr
      ,bldg_prop_name
      ,postal_delivery_type
      ,postal_delivery_nbr
      ,street_number_1
      ,street_number_2
      ,street_name
      ,street_type
      ,street_sfx
      ,tokenised_flg
	  ,e_mail2
  	  ,link_e_mail2 
	  
FROM paraddress as tb2

WHERE add_num = '1'

------FATHER DATA------
INSERT INTO paraddress 
SELECT cmpy_code
      ,par_code
      ,add_num = 5
      ,SUBSTRING(par_name ,CHARINDEX('AND',par_name)+3,LEN(par_name)) AS par_name
      ,par_name2
      ,addr1
      ,addr2
      ,addr3
      ,town_sub
      ,state_code
      ,post_code
      ,country
      ,dpid_text
      ,home_phone
      ,bus_phone
      ,fax
      ,salutation
      ,SUBSTRING(e_mail ,CHARINDEX(';',e_mail)+1,LEN(e_mail)) as e_mail
      ,link_address
      ,link_phone
      ,link_e_mail
      ,old_mobile
      ,mobile1
      ,sms_flg1
      ,link_mobile1
      ,mobile2
      ,sms_flg2
      ,link_mobile2
      ,dpid
      ,latitude
      ,longitude
      ,unit_type
      ,unit_nbr
      ,floor_type
      ,floor_nbr
      ,lot_nbr
      ,bldg_prop_name
      ,postal_delivery_type
      ,postal_delivery_nbr
      ,street_number_1
      ,street_number_2
      ,street_name
      ,street_type
      ,street_sfx
      ,tokenised_flg
      ,e_mail2
	  ,link_e_mail2 

FROM paraddress as tb2

WHERE tb2.add_num = 1

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, abass44 said:

The other thing i know you will be able to help me work out which will be similar to the email field is how to split the two peoples names from the column "par_name"
This will generally look like this: 
image.png.063684297a4e47def2b5e2ef21e0a31f.png

 

However not all entries will look that clean.  some might have the Mr XXX first or just Mr and Mrs XXX. 
The data is very inconstant which we will be trying to clean up over time.

You should really think about normalizing your database, as @minibois said.

 

You're doing things in your database that have no business being there. It'll make your queries a lot slower and require tons of workarounds in the end. How you present parents (e.g. "and" between them, female first, male second, etc.) is presentation logic. That has no business being in your data layer, since requirements for this can change in the future.

 

Have a table for persons, then have a separate table for relationships between them.

{ person_id, first_name, last_name, … }

{ relation_id, child_id, parent_id }

 

You could also add a relationship status. For example, what about people who have natural parents and foster parents? How would you present those in your combined column? This makes queries very awkward as you already discovered, require multiple queries and then additional processing in your code when you have a result.

 

With normalized data, querying e.g. only for the mother would become a lot easier.

select …
  from person child
  join relation r on r.child_id = child.person_id
  join person mother on mother.person_id = r.parent_id and mother.sex = 'f'

 

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Eigenvektor said:

You should really think about normalizing your database, as @minibois said.

 

You're doing things in your database that have no business being there. It'll make your queries a lot slower and require tons of workarounds in the end. How you present parents (e.g. "and" between them, female first, male second, etc.) is presentation logic. That has no business being in your data layer, since requirements for this can change in the future.

 

Have a table for persons, then have a separate table for relationships between them.

{ person_id, first_name, last_name, … }

{ relation_id, child_id, parent_id }

 

You could also add a relationship status. For example, what about people who have natural parents and foster parents? How would you present those in your combined column? This makes queries very awkward as you already discovered, require multiple queries and then additional processing in your code when you have a result.

 

With normalized data, querying e.g. only for the mother would become a lot easier.


select …
  from person child
  join relation r on r.child_id = child.person_id
  join person mother on mother.person_id = r.parent_id and mother.sex = 'f'

 

I understand, however that would be out of my hands. On top of that, the database belongs to the software TASS and we don't plan to go around modifying it. This is the only case at this stage where we are inserting data from the back end. Once this is done, it's up to HR to ensure, the data is put in correctly and we will most likely generate a report weekly to ensure that they are. 

I have ended up getting it to do what we want now anyway. Just waiting the go ahead to actually run it. 


Either way, Thank you for the info i will look back at this maybe at a later date if need be. 

Also thanks to @Minibois for all the help you have provided. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×