Difference of Arrays
Write a function diff(arrA, arrB) that accepts two arrays and returns a new array that contains all values that are not contained in both input arrays. The order of numbers in the result array does not matter.
Examples
Pythona = [1, 2, 3, 4] b = [3, 4, 5, 6] diff(a, b) # => [1, 2, 5, 6]
Note: we don't care if numbers are out of order or repeated; for example, these two arrays are equivalent since they are re-arrangements of the same values:
Pythona = [1, 2, 1] b = [2, 1, 2] diff(a, b) # => []
It's possible to solve this problem by brute force (comparing every element to the elements in the other list), but can you think of a more efficient solution?
There is another data structure related to an array that makes it easier to do these kind of comparisons.
You could use a hash table or a set. A hash table will allow you to check whether an element exists in the other list more easily. Sets—if your language of choice supports them—are an even better option.
Can you solve this problem in linear time?
Questions like this are usually used to test your knowledge of different data structures and reason about their space and time tradeoffs. You may be asked to implement one version and then asked to improve it under a time or space constraint.
It's always a good idea to begin by discussing the 'brute force' solution, and then inquire about time and space constraints. This will also give you a moment to consider the problem from different angles.
The brute force solution to this problem is to iterate through each array and check for the presence of every item in the other array. Notice that there's a lot of repeated operations with this approach because it requires looping over each array to check for every item, which results in an O(n^2) runtime.
# Solution 1: Brute force
def diff(arrA, arrB):
result = []
# Check if each item in A is present in B
for i in arrA:
if i not in arrB:
result.append(i)
# Repeat for each item in B
for i in arrB:
if i not in arrA:
result.append(i)
return resultWe can solve the problem much more efficiently by using two hash tables (or sets). We create a hash for each array. Then we loop over each value in the hash and check if the opposite hash contains that value. The final solution looks almost indistinguishable from the brute force version, but since the lookup time of a hash is constant, our runtime performance is reduced to O(n), which is huge improvement.
Check out our lesson on the two pass approach to learn more about it and its applications!
# Solution 2: Hash table comparison
def diff(arrA, arrB):
hashA = {n: True for n in arrA}
hashB = {n: True for n in arrB}
result = []
# Check if each item in A is present in B
for n, _ in hashA.items():
if n not in hashB:
result.append(n)
# Repeat for each item in B
for n, _ in hashB.items():
if n not in hashA:
result.append(n)
return resultAnother way to solve this problem is to think about it in mathematical terms. Ultimately, we're being asked to find the symmetric difference (or XOR) of two sets of numbers. Most languages have built-in support for sets, and we can leverage this to simplify our solution a lot! Here's how we could use Python's set arithmetic to solve the question in one line:
# Solution 3: Use set arithmetic
def diff(arrA, arrB):
return list(set(arrA) ^ set(arrB))
#include <iostream> #include <vector> #include <unordered_set> using namespace std; vector<int> diff(const vector<int>& A, const vector<int>& B) { unordered_set<int> elements; vector<int> result; for (const auto& element : A) { elements.insert(element); } for (const auto& element : B) { if (elements.find(element) == elements.end()) { result.push_back(element); } else { elements.erase(element); } } for (const auto& element : elements) { result.push_back(element); } return result; } int main() { vector<int> A = {1, 2, 3, 4}; vector<int> B = {3, 4, 5, 6}; vector<int>differ = diff(A,B); sort(differ.begin(),differ.end()); for(auto difference : differ) { cout << difference << " "; } return 0; }