Tags : String, Subsequence, Dynamic Programming, Memoization
We need to find some shortest possible Strings(meaning that they won't have any unnecessary suffix or prefix) that can produce the names as Subsequences and we have to output :
- How many unique Strings are there?
- What is the length?
-
Subarray/Substring vs Subsequence and Programs to Generate them - GeeksForGeeks
-
Dynamic Programming, Memoization, Tabulation - freeCodeCamp.org (Video)
The problem statement has already mentioned that the length of the names (Strings) will not cross 30. We will take two 2D arrays of size 31 x 31
(although we just need length of name1 * length of name2 for each test case but let's keep it this way just for the sake of ease of implementation and since neither it consumes a lot of memory),
- one for memoizing the length of the shortest String
- another one to keep record of how many unique Strings are generated.
But first let us understand what type of Strings we are counting and to do that let's take an example:
USA
USSR
For the above example (and others), all (meaning that they may contain unnecessary suffix and prefix) the possible Strings whose Subsequences can produce both the name is actually inifinite. For example:
USSRA
abcdUSSRApzxockbvbkpvzxco
USASRadasdasd
USSAR
fwetUSSARrweUSASRadasdasd
dasdsadasdsadcwqrfqtgUSASR
gjoUSSRApidhUSSARglkdfg
dfgdgfUSSAR
USSARgnsdihgisgf
...
...
But the problem has asked us to find the shortest Strings whose Subsequences can produce both of the names. So we don't keep any unnecessary suffix or prefix.
For distinction only and to make them easily detectable, some Substrings are capitalized in each Strings shown above, from which we can obtain both USA
and USSR
.
For example :
From the String fwetUSSARrweUSASRadasdasd
, we can take any of the Substrings USSAR
or USASR
which both of them can produce the Subsequences USA
and USSR
. Remember that both the Substrings and Subsequences must maintain the order of characters of its original String. The only difference between Substrings and Subsequences is that Substrings must contain consecutive characters too while Subsequences has NO such bound.
Now if we just think of shortest unique Strings for this particular test case, those would be only these 3 :
USSRA
USSAR
USASR
And as already explained before, RAUSS
or similar Strings that contain all the characters of both the names but if the characters to produce the names can NOT be retrieved in a Sequential order, are not Subsequences at all, thus we are not counting them nor they are our concern.
Now that we have finally understand what type of Strings we are looking for, let's understand how we are using the arrays to memoize and find the solutions.
At first we will take the names and shift their characters right by 1 index for ease of comparison and implementation in the upcoming steps. Also because at length == 0
they won't have any common characters at all.
As we have shifted the characters to right by 1 index, then we simply create both the matrix for dynamic programming by occupying the first column and first row. We can we do it in any fashion we want. Row and Column represents the characters of each of names. I have chosen Row to represent the first name and the Column to represent the second name, it can be done other way around as well.
For the shortest Strings matrix, we will put the max(row,column)
value to each of blocks in j-th column and i-th row. For example: for (4,0)
and (0,4)
, 4
is the maximum of both the values so those two blocks will have 4
as their value. We are doing this because we are assuming name1 == name2
at the initial stage.
The shortestString
matrix should look like something like this at the initial state:
I/J | 0 | 1 | 2 | 3 | 4 | ... | 30 |
---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 4 | ... | 30 |
1 | 1 | ||||||
2 | 2 | ||||||
3 | 3 | ||||||
4 | 4 | ||||||
... | ... | ||||||
30 | 30 |
Now to memoize the shortestString matrix, we must check the characters of a name in respect to the other one. Each block
represents the common length of the shortest string required for both of the names' 1st character to those particular indice
of those names respectively. (Remember, we already shifted the characters by 1
character.)
For Example : shortestString[1][3]
's value
is the length required when I am taking name1's to substring length = 1 (USA
's length 1 substring == "U
") and name2's substring length = 3 (USSR
's length 3 substring == "USS
"). So, shortestString[1][3]
= 3
is required as the bare minimum length for U
and USS
.
i/j | 0 | 1(U) | 2(S) | 3(S) |
---|---|---|---|---|
0 | 0 | 1 | 2 | 3 |
1(U) | 1 | 1(U) | 2(US) | 3(USS) |
We just simply keep adding 1
to the last updated cumulative sum. Let's understand what to do on which condition :
-
Case name1[i] == name2[j] : Since both of the
characters
are same, we don't need to extend the string length. We can just add1
to previous common length from upper-left block, which isshortestString[i-1][j-1]
. Look at the table below for clarification of an example:-
Example :
i[2] == j[3]
For "US" of "USA" (since i == 2) and "USS" of "USSR" (since j == 3)
So,
shortestString[2][3] = shortestString[2-1][3-1] + 1 = 2 + 1 = 3
-
-
Case name1[i] != name2[j] : Since the
characters
don't match, we will take the minimum (look at example to understand why not maximum) length encountered betweenfirst name
's length andsecond name
's length. For clarification let us again look at the table with an example:-
Example :
i[1] != j[2]
So,
shortestString[2][3] = min(shortestString[2][2],shortestString[1][3]) + 1 = shortestString[2][2] + 1 = 3
if we had taken maximum then it had been
shortestString[2][3] = shortestString[1][3] + 1 = 4
But do we really need
USSS
to get name1's2
length substringUS
and name2's3
length substringUSS
? In other words, by taking maximum we are actually consideringUS
andUSSS
which is becauseUSS
is already included inshortestString[1][3]
; thus unnecessary suffix has been added.
-
In example of USA
and USSR
, it will look something like this after memoization.
i/j | 0 | 1(U) | 2(S) | 3(S) | 4(R) |
---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 4 |
1(U) | 1 | 1(U) | 2(US) | 3(USS) | 4(USSR) |
2(S) | 2 | 2(US) | 2(US) | 3(USS) | 4(USSR) |
3(A) | 3 | 3(USA) | 3(USA) | 4 (USA + S) [from j=3] / (USS + A)from[i=3] =(USAS/USSA) |
5 ((USAS/USSA) + R)) /(USSR + A) =(USASR/USSAR/USSRA) |
And shortestString[name1.length()-1][name2.length()-1]
give us the length of the shortest string which is 5
.
For the unique String matrix, we occupy the 1st column and 1st row with 1
s because we are assuming that the number of unique string is 1
, simply name1 = name2
meaning there is no difference of characters.
The uniqueString
matrix should look like this initially:
I/J | 0 | 1 | 2 | 3 | 4 | ... | 30 |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 | ... | 1 |
1 | 1 | ||||||
2 | 1 | ||||||
3 | 1 | ||||||
4 | 1 | ||||||
... | ... | ||||||
30 | 1 |
While populating there can be these cases :
-
Case name1[i] == name2[j] : When the characters matches, we don't need to create/branch out to another new sequence and thus we simply just update the value of the current block by taking the number of unique string from upper-left block.
-
Case name1[i] != name2[j] && shortestString[i][j - 1] == shortestString[i - 1][j] : It means there had been 2 unique strings sequence of the same length as we can already see in the
shortestString
matrix. And we need to take both the sequence in account.-
Example :
shortestString[3][3]
is populated by this way. How? name1 length 3 substring isUSA
and name2 length 3 substring isUSS
.shortestString[2][3]
isUSS
andshortestString[3][2]
isUSA
and neither of them can produce the other name's substring upto that length. So we can add the unique sequence characterA
of name1 to name2's length 3 substringUSS+A
; also if we do it the other way around, add the unique sequence characterS
of name2 to name1's length 3 substringUSA+S
, is valid and has to be counted as per question requirement. So, we are adding the number of unique strings found in those states.shortestString[3][3] = number of unique strings found for name1 length 3 substring + number of unique strings found for name2 length 3 substring = shortestString[3][2] + shortestString[2][3] = 2
-
-
Case name1[i] != name2[j] && shortestString[i][j - 1] != shortestString[i - 1][j] : In this case we just simply take the minimum of which can be taken from left block or upper block. The logic is similar to
shortestString[][]
population. We don't want unnecessary suffix to be added.
If we have done everything accordingly, uniqueString
matrix should look like this :
i/j | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 1 | 1 | 1 |
3 | 1 | 1 | 1 | 2 | 3 |
Now, uniqueString[name1.length()-1][name2.length()-1]
tells us how many unique Strings are there.
The above implementation is accepted
.
Caution : The matrix to memoize the number of unique strings, must be of an integer data type that can hold 263.
Notes :
- You can solve the problem recursively too and it won't throw any MLE.
long
data type's size ofc++
is dependent on the hardware (processor) and the software (operating system and compiler).LightOJ
uses a 64-bit compiler so onlylong
is sufficient. However we might need to uselong long
if any of the dependency is 32-bit.
#include <bits/stdc++.h>
using namespace std;
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
cout.tie(NULL);
int testCases;
string name1, name2;
int shortestString[31][31];
long uniqueString[31][31];
cin >> testCases;
for (int testCase = 1; testCase <= testCases; testCase++)
{
cin >> name1 >> name2;
//Shift the characters of the name to right for ease of memoizing
name1.insert(0, "0");
name2.insert(0, "1");
//Prepare the matrices for memoization
for (int i = 0; i < 31; i++)
shortestString[0][i] = shortestString[i][0] = i, uniqueString[i][0] = uniqueString[0][i] = 1;
for (int i = 1; name1[i]; i++)
{
for (int j = 1; name2[j]; j++)
{
//Checking if we need to take the cumulative sum from upper-left block
if (name1[i] == name2[j])
{
//Adding 1 to cumulative sum from upper-left block
shortestString[i][j] = 1 + shortestString[i - 1][j - 1];
//No need to add a new branch of unique strings so taking cumulative sum from upper-left block
uniqueString[i][j] = uniqueString[i - 1][j - 1];
}
else
{
//Finding the minimum from left and upper block and adding 1 to the value of current block
shortestString[i][j] = 1 + min(shortestString[i][j - 1], shortestString[i - 1][j]);
//Checking if there are two unique strings of the same length
if (shortestString[i][j - 1] == shortestString[i - 1][j])
uniqueString[i][j] = uniqueString[i][j - 1] + uniqueString[i - 1][j];
//Checking if left block has the minimum value in shortestString matrix
else if (shortestString[i][j - 1] < shortestString[i - 1][j])
uniqueString[i][j] = uniqueString[i][j - 1];
else
uniqueString[i][j] = uniqueString[i - 1][j];
}
}
}
cout << "Case " << testCase << ": " << shortestString[name1.length() - 1][name2.length() - 1] << " " << uniqueString[name1.length() - 1][name2.length() - 1] << "\n";
}
return 0;
}
#include <bits/stdc++.h>
using namespace std;
string name1, name2;
int shortestString[31][31];
long uniqueString[31][31];
void memoizeArrays(int i, int j){
if(i == name1.length()) //base case
return;
if(j == name2.length()) //check next row and set column to 1
return memoizeArrays(++i, 1);
else{
if (name1[i] == name2[j]){
//Adding 1 to cumulative sum from upper-left block/current common length of both string
shortestString[i][j] = 1 + shortestString[i - 1][j - 1];
//No need to add a new branch of unique strings so taking cumulative sum from upper-left block
uniqueString[i][j] = uniqueString[i - 1][j - 1];
}else{
//Finding the minimum from left and upper block and adding 1 to the value of current block
shortestString[i][j] = 1 + min(shortestString[i][j - 1], shortestString[i - 1][j]);
//Checking if there are two unique strings of the same length
if (shortestString[i][j - 1] == shortestString[i - 1][j])
uniqueString[i][j] = uniqueString[i][j - 1] + uniqueString[i - 1][j];
//Checking if left block has the minimum value in shortestString matrix
else if (shortestString[i][j - 1] < shortestString[i - 1][j])
uniqueString[i][j] = uniqueString[i][j - 1];
else
uniqueString[i][j] = uniqueString[i - 1][j];
}
//Check next column of the same row
return memoizeArrays(i, ++j);
}
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
cout.tie(NULL);
int testCases;
cin >> testCases;
for (int testCase = 1; testCase <= testCases; testCase++)
{
cin >> name1 >> name2;
//Shift the characters of the name to right for ease of memoizing
name1.insert(0, "0");
name2.insert(0, "1");
//Prepare the matrices for memoization
for (int i = 0; i < 31; i++)
shortestString[0][i] = shortestString[i][0] = i, uniqueString[i][0] = uniqueString[0][i] = 1;
//Memoize the array
memoizeArrays(1, 1); //Remember, we shifted characters to the right by 1 character
cout << "Case " << testCase << ": " << shortestString[name1.length() - 1][name2.length() - 1] << " " << uniqueString[name1.length() - 1][name2.length() - 1] << "\n";
}
return 0;
}