Given two strings s1 and s2. Find out the longest common substring.
Lets solve this problem using Dynamic Programming.
We assume a 2D matrix such that m[i][j] is the maximum length of LCS in substring s1[0..i] and s2[0..j].
Using this recursive formula m[s1.size()][s2.size()] will give the maximum length of the common substring.
Now, we can easily use this matrix to get the maximum length.
Output:
Lets solve this problem using Dynamic Programming.
We assume a 2D matrix such that m[i][j] is the maximum length of LCS in substring s1[0..i] and s2[0..j].
Recursive formula to get the length of LCS:
m[i][j] = 0 if(i==0, j==0) = m[i-1][j-1] + 1 if(s1[i] == s2[j]) = max( m[i][j-1], m[i-1][j] ) if(s1[i] != s2[j])
Using this recursive formula m[s1.size()][s2.size()] will give the maximum length of the common substring.
Code to get the LCS matrix:
int **getLCSMatrix(string s1, string s2) { int r = s1.size() + 1, c = s2.size() + 1; int **m = (int**) malloc(r * sizeof(int *)); for (int i = 0; i < c; i++) m[i] = (int *) malloc(c * sizeof(int)); for (int i = 0; i < r; i++) m[i][0] = 0; for (int i = 0; i < c; i++) m[0][i] = 0; for (int i = 1; i < r; i++) { for (int j = 1; j < c; j++) { if (s1[i - 1] == s2[j - 1]) { m[i][j] = m[i - 1][j - 1] + 1; } else { m[i][j] = max(m[i - 1][j], m[i][j - 1]); } } } return m; }
Now, we can easily use this matrix to get the maximum length.
Output:
LCS length: 3
Reading out an LCS:
The following code backtracks the choices taken when computing the LCS matrix. If the last characters in the prefixes are equal, they must be in an LCS. If not, check what gave the largest LCS of keeping s1[i] and s2[j], and make the same choice. Just choose one if they were equally long. Call the function with i=s1.size() and j=s2.size().
string getLCS(int **m, int i, int j, string s1, string s2) { if (i == 0 || j == 0) return ""; if (s1[i - 1] == s2[j - 1]) return getLCS(m, i - 1, j - 1, s1, s2) + s1[i - 1]; else if (m[i - 1][j] < m[i][j - 1]) return getLCS(m, i, j - 1, s1, s2); else return getLCS(m, i - 1, j, s1, s2); }
Reading out all the LCS:
setgetAllLCS(int **m, int i, int j, string s1, string s2, string s) { set v; if (i == 0 || j == 0) { v.insert(s); return v; } if (s1[i - 1] == s2[j - 1]) { set v1 = getAllLCS(m, i - 1, j - 1, s1, s2, s1[i - 1] + s); v.insert(v1.begin(), v1.end()); } else if (m[i - 1][j] == m[i][j - 1]) { set v1 = getAllLCS(m, i - 1, j, s1, s2, s); set v2 = getAllLCS(m, i, j - 1, s1, s2, s); v.insert(v1.begin(), v1.end()); v.insert(v2.begin(), v2.end()); } else if (m[i - 1][j] < m[i][j - 1]) { set v1 = getAllLCS(m, i, j - 1, s1, s2, s); v.insert(v1.begin(), v1.end()); } else { set v1 = getAllLCS(m, i - 1, j, s1, s2, s); v.insert(v1.begin(), v1.end()); } return v; }
You can find the complete source code on my github page: Here
It is a solution for longest common sub-sequence not longest common substring
ReplyDelete