Jump to content

Need help with algorithm (dynamic programming)

You are given n sequences.

Example :

GGATATAAAAAC
GATAACCGCGCAGTGATGAGA
TGATGAGATGGGGATATAAAA
AGATAGATGATAACCGCGCAGT
ATGGGGATATAAAAACTTTTTT

You must compute the shortest sequence which contains all the given sequences as subsequences. For the given example, that would be :

AGATAGATGATAACCGCGCAGTGATGAGATGGGGATATAAAAACTTTTTT

I think this problem could be rewritten as a problem of finding the shortest hamiltonian path/cycle in a graph. I can't really figure out how to make the input data into a graph which makes sense in this context though.

Or if you guys have another idea , please share. I'm pretty sure it can be solved using DP without making it entirely into a graph problem.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
https://linustechtips.com/topic/566624-need-help-with-algorithm-dynamic-programming/
Share on other sites

Link to post
Share on other sites

This might point you in the right direction, at least for the overlapping part of the problem:

http://stackoverflow.com/questions/1285434/efficient-algorithm-for-string-concatenation-with-overlap

 

The part of the problem that deals with the order in which you attempt to find and apply overlaps (given that you're dealing with multiple sequences at once) is another big part of the problem.

Link to post
Share on other sites

4 minutes ago, clegger said:

This might point you in the right direction, at least for the overlapping part of the problem:

http://stackoverflow.com/questions/1285434/efficient-algorithm-for-string-concatenation-with-overlap

 

The part of the problem that deals with the order in which you attempt to find and apply overlaps (given that you're dealing with multiple sequences at once) is another big part of the problem.

Backtrack with KMP could also work , but it's not fast enough.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to post
Share on other sites

So after some documentation I've found out this is called the shortest common supersequence problem which can be derived from the longest common subsequence problem. For arbitrary many inputs , the problem is NP.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×