Skip to main content

List the Difference Between Two Strings

MediumPremium

Given two strings of uppercase letters source and target, list (in string form) a sequence of edits to convert from source to target that uses the fewest edits possible.

For example, with strings source = "ABCDEFG", and target = "ABDFFGH" we might return: ["A", "B", "-C", "D", "-E", "F", "+F", "G", "+H"

More formally, for each character C in source, we will either write the token C, which does not count as an edit; or write the token -C, which counts as an edit.

Additionally, between any token that we write, we may write +D where D is any letter, which counts as an edit.

At the end, when reading the tokens from left to right, and not including tokens prefixed with a minus-sign, the letters should spell out target (when ignoring plus-signs.)

In the example, the answer of A B -C D -E F +F G +H has total number of edits 4 (the minimum possible), and ignoring subtraction-tokens, spells out A, B, D, F, +F, G, +H which represents the string target.

If there are multiple answers, use the answer that favors removing from the source last.

If X and Y have the same starting letter, we can write it and then write our solution to the problem involving strings X[1:] and Y[1:].

For example,

function solve(X, Y): if X[0] == Y[0]: return X[0] + " " + solve(X[1:], Y[1:]) else: #TODO

What are our options if X and Y have a different letter?

Say X starts with 'A' and Y starts with 'B'. Then either we subtracted A ('-A'), and write our solution to the problem for (X[1:], Y) or we added B ('+B') and then wrote our solution to the problem for (X, Y[1:]).

For example,

function solve(X, Y): if X[0] == Y[0]: return X[0] + " " + solve(X[1:], Y[1:]) else: ans1 = solve(X[1:], Y) ans2 = solve(X, Y[1:]) if ans1.token_length <= ans2.token_length: return "-" + X[0] + " " + ans1 else: return "+" + Y[0] + " " + ans2

Say we knew the minimum number of edits required for the problem with strings X[i:], Y[j:]. How might we use that to find the actual edits themselves?

How might we avoid repeating work, such as calculating the answer to the problem for inputs (X[3:], Y[3:]) multiple times?

This problem is easiest to attempt in two parts.

First, try to find how long the final answer for the strings source[i:] and target[j:] must be, using dynamic programming. Second, try to reconstruct the answer using the following logic: use two pointers to do the second part, If source[i] == target[j] we write source[i] in the answer. If j is invalid or dp(i+1, j) <= dp(i, j+1) we write -source[i] in the answer. Otherwise, we write +target[j].