Finding the longest common subsequence in a variable amount of sets with no repeating characters?

I'm trying to come up with an efficient algorithm that would work in JavaScript for the longest common subsequence problem. However, there are two main differences between my problem and the one described in the Wikipedia article. The first is that I will have more than two sets of characters. The second is that characters will never repeat in a set. This means that the length of each set will be at most about 50 characters (i.e. the printable ASCII chars).

For example, the sets may contain:

``````A = ZBANICOT
B = ACNTBZIO
C = ANICOTZB
D = ZIANCOTB
``````

... and it should output `ACO`, `ACT`, `ANO`, and `ANT`, since these are the longest subsequences in all 4 sets (as far as I can tell from manually examining these sets).

Since none of the letters repeat, is there a more efficient algorithm I should consider rather than the one described in the Wikipedia article? If not, is there anywhere that describes how to convert the algorithm to having N sets instead of 2?

Not sure how efficient an adaption of the normal LCS algorithm would be, so this may be inefficient, but since there aren't many eligible letters, it's not too horrible.

Make a successor matrix. For each string `s`, for each pair of letters `(s[i],s[j])` with `i < j`, increment `matrix[s[i]][s[j]]`. A pair of letters can only be part of a longest common subsequence if `matrix[a][b] = n = number of strings`. Build a directed graph from the letters participating in such a pair, with an edge from `a` to `b` iff `matrix[a][b] = n`. Find a longest path in that graph.

What you are trying to do is called multiple sequence alignment. The algorithm 'could' be rewritten to be n-dimensional, but that would increase the space complexity to `Length ^ Dimensions`.

Since none of the letters repeat, is there a more efficient algorithm I should consider rather than the one described in the Wikipedia article?

Perhaps. First create a mapping (a,b) such that a < b. For example the mappings for the first three characters of each sets is as follows

``````--- Mapping for character[0]
Z - B    A - C    A - N    Z - I
A        N        I        A
N        I        C        N
I        B        O        C
C        Z        T        O
O        I        Z        T
T        O        B        B

--- Mapping for character[1]
B - A    C - N    N - I    I - A
N        I        C        N
I        B        O        C
C        Z        T        O
O        I        Z        T
T        O        B        B

--- Mapping for character[2]
A - N    N - I    N - C    I - N
I        B        O        C
C        Z        T        O
O        I        Z        T
T        O        B        B
``````

Then retain only mappings that are common on all sets. Here's an example of following this process, traversing Set[A] (`ZBANICOT`), which will give us a set of relations that are common to all sets.

So with the letter `Z` you will observe that in Set[C], the only character that follows `Z` is `B`, so you can eliminate just about every mapping from `Z` that is not to B. But to cut a long story short, every mapping for `Z` is removed. Note, only mappings from Z are removed, not mappings to `Z`.

Also note that since Set[D] ends with `B`, we can safely eliminate all mappings from `B`. By following the process one can observe that `A` maps to `N`, `C`, `O` and `T`. Next is N, which maps to `T` and `O`.

`I` maps to `O` (honestly, "I/O") while `C` maps to `O` and `T` (how cute). And funnily enough `O` maps to nothing!!! How befitting.

Finally there is one more character, and `T` maps to nothing too (since `ZBANICOT` ends with `T`).

In the end you are left with the following mappings:

``````A - C    C - O    I - O    N - O
N        T                 T
O
T
``````

Now all you have to do is start from each of the keys (`A`, `C`, `I` and `N`) and search for the longest sequence without any repeating nodes (characters in this case).

Here are all of the common subsequences (all using the simple algorithm):

``````ACO
ACT
ANO
ANT
AO
AT
CO
CT
IO
NT
NO
``````

What if I stored character mapping for each set of characters and then incremented this count each time...

For the 1st set (`ZBANICOT`), this would create the following character pairs (28 of them):

``````ZB, ZA, ZN, ZI, ZC, ZO, ZT
BA, BN, BI, BC, BO, BT
AN, AI, AC, AO, AT
NI, NC, NO, NT
IC, IO, IT
CO, CT
OT
``````

For the 2nd set (`ACNTBZIO`), I would have 28 pairs again:

``````AC, AN, AT, AB, AZ, AI, AO
CN, CT, CB, CZ, CI, CO
NT, NB, NZ, NI, NO
TB, TZ, TI, TO
BZ, BI, BO
ZI, ZO
IO
``````

For the last 2 sets, I would have:

``````AN, AI, AC, AO, AT, AZ, AB
NI, NC, NO, NT, NZ, NB
IC, IO, IT, IZ, IB
CO, CT, CZ, CB
OT, OZ, OB
TZ, TB
ZB

ZI, ZA, ZN, ZC, ZO, ZT, ZB
IA, IN, IC, IO, IT, IB
AN, AC, AO, AT, AB
NC, NO, NT, NB
CO, CT, CB
OT, OB
TB
``````

(So to figure out the iteration count so far, let `N` equal the number of sets, and `M` be the number of characters in each set, we have `N*(M)((M-1)/2)` or `4*28=112` in this example)

Next, I would count the number of times each pair exists. For example,

``````PairCounts["ZB"] == 3
PairCounts["ZA"] == 2
// ...
PairCounts["AO"] == 4
PairCounts["AT"] == 4
// ...
PairCounts["AN"] == 4
PairCounts["AC"] == 4
PairCounts["CT"] == 4
PairCounts["NO"] == 4
PairCounts["NT"] == 4
``````

Then, since we have 4 pairs, I would discard all pairs which don't have a count of 4. And then I would try concatting each pair in order to form the longest string possible.