Identify When Model Needs Refresh
In this mock interview, Angie asks Raj (MLE @ Snapchat), “How can we tell if a model needs to be refreshed?” Below is a supplemental written solution that shows how to approach the question and follow-up questions.
Answer
Typically, a model will need to be refreshed when there is a degradation in the algorithm's performance. So generally you will benchmark the performance on some sort of training set and perhaps at some point you see that the performance of your data and production is not matching up to the performance on the training set. Some of the ways that you can tell is by using some of the metrics that you chose for your initial problem. For example, a precision metric, recall metric, the loss, or the accuracy. This assumes you have a ground truth label for the data incoming and production. It isn't always that straightforward because you don't always have that source of ground truth in production. Alternative strategies can include monitoring data distributions of the input features for your model, as well as prediction distributions and confidence scores from the algorithm itself. It really depends kind of on your use case and there's no single best way to do it. However, there are definitely different ways of helping solve that.”
Let’s say your interviewer wants to continue the conversation through a follow-up question. For example, assume you’re asked, “Can you give me some reasons or some insights for why model performance might actually differ in production versus in development?”
A strong response would be, “There are many possible reasons why this could actually happen in production. I can give one example, which is something called concept drift, which is where the relationship between the input features and the actual outcome variable changes. Another way of thinking about it is that typically a supervised machine learning model is represented by the probability distribution of probability of Y given X. So concept drift is when this underlying distribution actually changes. All the assumptions when you train your model don't actually apply anymore. That can often be a common reason as to why performance isn't matching what you would expect.”
What makes this answer effective
This answer correctly identifies that models are typically updated when the performance degrades. It acknowledges that the metric chosen is dependent on the use case, and also demonstrates understanding that ground truth labels may not be present in production. It offers alternative strategies to understand degradation in model performance without ground truth labels.