Skip to main content

Explain Classification vs Regression

Premium

In this mock interview, Angie asks Raj (MLE @ Snapchat) to discuss the “differences between classification and regression in machine learning.” Below is a supplemental written solution that shows how to approach the question and follow-up questions.

Answer

Classification and regression refer to the type of outcome predicted by a supervised machine learning algorithm. And so in the case of classification, that will usually predict some sort of category. So in the simplest case, a yes or a no. Regression will be predicting some sort of numerical or continuous value, for example, a person's height.

Let’s say your interviewer wants to continue the conversation through a follow-up question. For example, assume you’re asked, “Can you foresee instances where a problem could be both classification or regression? Why might you choose one or the other?”

A strong response would be, “Let's say there was a case where the outcome was a numerical variable. You could use regression to formulate that problem. However, you could also bin the different values into different categories. For example, for height, you can bin them based on ranges. You could have one that says low, one that says medium, one that says high. Then you can turn that into a classification problem. One argument for classification is that it’s easier for the algorithm to distinguish and learn based on the actual patterns underlying the data. In the case of the height, the scale is kind of all over the place. There's a bigger space that you have to be able to predict, so getting the underlying pattern of whether it's in the medium range or the higher range might be something that's easier for the algorithm to learn. And it can also be more useful for the algorithm to learn. So it just depends on the use case and what makes most sense for your particular objective.”

What makes this answer effective

The answer correctly explains the difference between classification and regression, which is only related to the type of outcome that a machine learning algorithm predicts. It gives a concrete example to clarify this difference.