Measure Algorithm Success
Full prompt: Once you have built an algorithm, how do you know that it works well?
Describe the metrics you would use if this was a classification model vs. those you would use for a regression problem.
For classification problems start with a confusion matrix, and then build on it to describe the metrics you are interested in.
For classification, describe how the change in threshold impacts your metrics.
Why isn’t accuracy always a useful metric?
Describe precision-recall curves and how they differ from ROC curves. How do these plots inform how well your model is performing?
We use several metrics to understand how our model works:
- For classification algorithms, we might use accuracy, F1 scores, precision, recall, AUC, or many other metrics. In general, the closer these metrics are to 1, the better your model is performing.
- For regression algorithms, usually MAE or MSE are used. For either of these, lower values are better.
It’s good practice to split your data into training and testing data. This ensures that your “good” metric scores aren’t due to overfitting on the data the model has already seen, but are instead picking up on trends that can generalize to new data.
Eventually, your model will be put into the real world. At this point, the actual model metric values may matter less. Ultimately, there are business metrics you care about driving (e.g. increase in conversion rate, revenue, and engagement). These metrics guide your model once it is moved out of training and launched into production.
What makes this answer effective
This answer shows an understanding of different metrics, the risk of overfitting, and the importance of considering real-world business implications.
Other considerations
With more time, dive deeper into the details of the different metrics, especially if the interviewer prompts you to do so. You may also discuss how to monitor metrics over time once your model is deployed into production. For example, you can discuss metric movement from feature updates or model drift.