LRU Cache
Your program is running slowly because it's accessing data from disk over and over again. To improve the performance, you want to build a simple key-value store to cache this data in memory, but you also want to limit the amount of memory used. You decide to build a caching system that only keeps the N most recently used items—also known as a least recently used (LRU) cache.
Write a class LRUCache(n) that accepts a size limit n. It should support a set(key, value) method for inserting or updating items and a get(key) method for retrieving items. Can you implement a solution where both of these methods run in O(1) time?
Pythonclass LRUCache(n):
set(key, value)
get(key)
Examples
To make room for new items, the least recently used item should be removed. An item is 'used' whenever it is set, retrieved, or updated.
Pythoncache = LRUCache(2) # Limit of 2 items cache.set('user1', 'Alex') cache.set('user2', 'Brian') cache.set('user3', 'Chris')
Here user1 is empty because it was the least recently used and thus removed to make room for user3:
Pythoncache.get('user1') # => None cache.get('user2') # => 'Brian' cache.get('user3') # => 'Chris'
If you're having trouble getting started, think about which data structure(s) you can use to solve this problem. Which one is best for a key-value store? Which one can help you store items in order of recency?
You'll need to be able to find and remove the least recent item in O(1) time, so some sort of ordered list is necessary. An array won't work because moving items around takes O(n) time. Could you combine two different data structures to maintain this order and also get fast lookups?
You can use a doubly linked list to store the items in the correct order. When an item is used, just move its node to the front of the list. The least recently used node will always end up at the end of the list.
To look up items without traversing the entire list, you can also add a hash table that maps from the key to the item's node.
Whats the time complexity of your solution?
This problem is challenging because of the combination of constraints such as key-value lookup, size limits, and time complexity. Ultimately, there is no single data structure that fulfills all of these needs, so we have to get creative!
Our solution is to implement the cache class with a linked list and a hash table. The linked list is doubly-linked, which allows us to insert and remove items in constant time without traversing the entire list. Every time an item is set, updated, or retrieved, we just move its Node to the front of the list. When the cache reaches its size limit, we can easily make room by removing the item at the end of the list.
The linked list gives us O(1) insertion time, but by itself it has O(n) lookup time because we have to traverse the entire list to find a particular item. That's not ideal since we want to be able to look up an item quickly using its key. That's where the hash table comes in: we can use it add a mapping from an item's key to its Node. Together, these data structures give us a time complexity of O(1) for insertion, deletion, and retrieval in the worst case, and a space footprint of O(n).
# Helper class for doubly-linked list
class Node:
def __init__(self, key, val):
self.next = None
self.prev = None
self.key = key
self.val = val
class LRUCache:
def __init__(self, n):
self.n = n
self.count = 0
self.nodes = {}
self.start = None
self.end = None
def get(self, key):
# Get node for key if it exists
node = self.nodes.get(key)
if not node:
return None
# Move this node to front of list
self.move_to_front(node)
return node.val
def set(self, key, val):
node = self.nodes.get(key)
if node:
# Update existing node and move to front
node.val = val
self.move_to_front(node)
return
if self.count == self.n:
# No space, so remove last item
if self.end:
del self.nodes[self.end.key]
self.remove(self.end)
else:
self.count += 1
# Finally create and insert the new node
node = Node(key, val)
self.insert(node)
self.nodes[key] = node
# Private helpers
def insert(self, node):
if not self.end:
self.end = node
if self.start:
node.next = self.start
self.start.prev = node
self.start = node
def remove(self, node):
if node.prev:
node.prev.next = node.next
if node.next:
node.next.prev = node.prev
if self.start == node:
self.start = node.next
if self.end == node:
self.end = node.prev
def move_to_front(self, node):
# Remove node and re-insert at head
self.remove(node)
self.insert(node)
We can use dictionary to store cache items so that our read / write operations will be O(1).
Each time we read or update an existing record, we have to ensure the item is moved to the back of the cache. This will allow us to evict the first item in the cache whenever the cache is full and we need to add new records also making our eviction O(1)
Instead of normal dictionary, we will use ordered dictionary to store cache items. This will allow us to efficiently move items to back of the cache as well as evict the first item in the cache.
#Space complexity is O(n). We store a maximum of n items in the cache.
Time complexity is O(1) for both set and get methods. In set method, we check if key is in the cache (a constant operation) and remove it (also a constant operation). We also check if cache is full and perform eviction when the key is not found in the cache. Finally we add the item to the cache.
In get method, we check if key does not exist and return none. For existing keys, we read the value of the key from the cache, call set method so that the the entry will be promoted to the back of the cache before returning the value to the user. All of these operations are constant in time.
from collections import OrderedDict class LRUCache: def __init__(self, n): self.size = n self.cache = OrderedDict() def get(self, key): if not key in self.cache: return None val = self.cache[key] self.set(key, val) return val def set(self, key, val): if key in self.cache: del self.cache[key] elif len(self.cache) == self.size: self.cache.popitem(last=False) self.cache[key] = val # debug your code below cache = LRUCache(2) # Limit of 2 items cache.set('user1', 'Alex') cache.set('user2', 'Brian') print(cache.get('user1')) # => 'Alex' cache.set('user3', 'Chris') print("") print(cache.get('user1')) # => 'Alex' print(cache.get('user2')) # => None print(cache.get('user3')) # => 'Chris'