How to Answer ML Evaluation Questions
Overview
Evaluation questions assess your knowledge of model performance. Similar to model selection questions, these questions are generally asked in the context of a past project you’ve done or as a follow-up question. Some sample questions include:
- Explain when accuracy would be a good metric and a bad metric to measure how well your model is performing.
- How is MSE calculated? When would this not be a good metric to measure how well your model is performing?
- How is AUC calculated? What are “good” values of AUC and why?
To prepare for these types of questions, review the terms under the “Evaluation methods and metrics” section of our ML Interview Glossary.
How to answer
Before the interview, study and memorize the calculations for the metrics you intend to discuss when evaluating your algorithm. In particular, review MSE vs. MAE, accuracy, precision, recall, and AUC. Study how these metrics get applied, and which metrics are ideal for for specific scenarios. You should also be prepared to explain why one metric is better than another for a certain use case. The interviewer may also ask these questions in the context of a past project, so be sure to review these.
Let’s say your interviewer asks,
“How do you plot the ROC curve? What does the AUC-ROC represent and why is it important?”
Given the multiple moving pieces, a simple concept like the AUC-ROC curve can lead to a scattered explanation. Practice explaining this calculation by creating a toy example where you calculate the curve by hand. In your answer, cover the main concepts, which include:
- Calculating the true positive rate (TPR) and false positive rate (FPR) for various classification thresholds, and plotting them on a graph where the x-axis = FPR and the y-axis = TPR.
- Explaining why higher AUC values are better.
- Elaborating on TPR and FPR, if probed, and why these metrics are used for classification problems.
In the Evaluating a Model for ML Systems lesson, we describe different evaluation metrics and their appropriate use cases.
Common pitfalls
- Using accuracy as a metric simply because it’s the most commonly used and easily understandable. The accuracy metric has some pitfalls that make it suboptimal for some scenarios. For example, imagine you work for a company trying to determine fraudulent credit card transactions. Let’s say the proportion of all fraudulent historical transactions is 0.001%. If you always predict that a transaction isn’t fraud, the accuracy is 99.999% and recall is 100% (not frauds correctly marked as not fraud). However, precision is 0% (frauds you correctly caught).
- Rambling too long on one part of the problem. To avoid rambling, pause, gather your thoughts, and consider what the interviewer is likely looking for with their question. As simply as possible, state what method or metric works best, and then explain why. Rambling also happens when a candidate feels pressure to cover every detail in a single answer. Don't be afraid to end your answer and allow the interviewer to respond with follow-up questions.
Senior candidates
As expected, senior candidates have slightly different performance expectations. The more senior the role, the more you’re expected to demonstrate your ability to:
- Build the model infrastructure from end to end
- Gauge the pros and cons of using a particular algorithm
- Integrate your domain knowledge from previous roles
- Describe your experience productionizing ML models in previous roles
- Work cross-functionally with both technical and non-technical stakeholders.