Logistic Regression Concepts
In the real world, while predicting continuous values is certainly a common problem, it is perhaps even more common to face a situation where a prediction must be made on a binary outcome, known as a classification task. This is simply because there are so many different ways in which this comes up. Some examples include:
- Predicting whether a customer will convert (yes/no)
- Determining if an email is spam (yes/no)
- Classifying whether a transaction is fraudulent (fraud/no fraud)
- Finding what an image contains (cat/no cat)
- Diagnosing a disease (cancer/no cancer)
For such problems, logistic regression is the most commonly used type of model. In logistic regression, the target variable can be only one of two values (0 or 1) as opposed to a continuous value (can be any number). While linear regression is used to predict such continuous values (the price of a house, salary of a baseball player), logistic regression estimates the probability that an outcome will occur, making it ideal for classification.
Logistic regression model overview
In linear regression, we use the following formula to model the relationship between the independent variables and the dependent variable:
Where
- is the continuous dependent variable (e.g., sales)
- are the independent variables (predictors)
- are the coefficients (parameters)
In logistic regression, the dependent variable is binary, so the linear model is modified. Instead of predicting directly, we predict the log-odds of the outcome:
Where:
- is the probability of the outcome (e.g., conversion)
- is the odds of the outcome (think 3:1 odds, or 2:1 odds)
- The rest of the equation remains similar, with as the predictors and as the coefficients
Relationship to the sigmoid function
Working from the formula for logistic regression above, we can solve for our probability by taking advantage of the relationship of Euler’s number to the natural logarithm. Doing so, we come upon the following formula known as the sigmoid function:
Graphing the sigmoid function generates the following chart:

There are a few important things to note here.
Firstly, our initial linear combination of features has no upper or lower bound. Depending on the sizes of the values and coefficients the predictive space extends all the way from negative infinity to positive infinity for any given prediction. This is fine for predicting continuous values, like the price of a house, but doesn’t work if we want to predict binary outcomes. The sigmoid function changes this by transforming our linear combination of features into a predicted value bounded between 0 and 1, which represents the probability of a given outcome.
Secondly, the sigmoid function is an S-shaped curve with a few important characteristics:
- Midpoint behavior: The midpoint of the sigmoid function occurs when the log-odds equal zero, which corresponds to a probability of 0.5. At this point, the model is essentially uncertain about the outcome—it’s equally likely to predict either class (0 or 1). The curve around this point is steep, meaning that small changes in the log-odds lead to rapid changes in predicted probability. This steep transition ensures that cases near the decision boundary (e.g., whether a user converts or not) are very sensitive to small shifts in input values.
- Closer to the extremes: As the log-odds increase far above 0 or fall far below 0, the sigmoid function flattens out. This means that for very high positive or negative log-odds, the predicted probabilities approach 1 or 0, respectively, but they do so slowly. As the probability nears 0 or 1, the model becomes more confident in its prediction, and additional changes in the predictors have diminishing effects on the probability.
This transformation ensures that logistic regression outputs valid probability values for binary classification problems.
Assumptions of logistic regression
Like linear regression, logistic regression comes with its own set of assumptions:
- Linearity of independent variables and log-odds: In linear regression, the assumption is that the independent variables have a linear relationship with the dependent variable. In logistic regression, this assumption changes slightly—the independent variables should have a linear relationship with the log-odds of the dependent variable.
- Independence of observations: Both logistic and linear regression assume that observations are independent of each other.
- Absence of multicollinearity: Both models assume that the predictors should not be highly correlated with one another. High multicollinearity can make it difficult to interpret the impact of individual predictors.
- Large sample size: Logistic regression, like linear regression, benefits from large sample sizes. This ensures that the parameter estimates are reliable, particularly for small probabilities.
Evaluating model performance
In logistic regression, the goal is not to predict continuous values but to classify outcomes into one of two categories (e.g., conversion or no conversion). To evaluate a logistic regression model’s performance, we need classification-specific metrics. The traditional linear regression metrics consider “error” to be related to the distance of the predicted values to the actual values which, in the case of a binary classification, doesn’t make much sense. More specifically:
- R-squared: Measures the proportion of variance in a continuous dependent variable that is explained by the independent variables. Logistic regression does not predict continuous values but probabilities, which are then classified as 0 or 1, so R-squared is not meaningful.
- RMSE: Measures the average distance between predicted and observed continuous values. Since logistic regression deals with probabilities and classification, there is no continuous “distance” to measure in the same way as in linear regression.
Instead, in classification, there are two different types of “errors” we are concerned with, as well as two different ways in which a prediction can be correct. The metrics we use then take into account these 4 different possible categories a prediction can fall into:
- True Positives (TP): These are cases where the model correctly predicts the positive class. For example, predicting that a user will convert (1) when they actually do convert.
- False Positives (FP): These are cases where the model incorrectly predicts the positive class. For example, predicting that a user will convert (1) when they actually do not convert (0). False positives are important to consider, as they can lead to wasted efforts or resources (e.g., targeting a user who won’t convert).
- True Negatives (TN): These are cases where the model correctly predicts the negative class. For example, predicting that a user will not convert (0) when they actually don’t convert.
- False Negatives (FN): These are cases where the model incorrectly predicts the negative class. For example, predicting that a user will not convert (0) when they actually do convert (1). False negatives are critical to consider because they represent missed opportunities, such as failing to identify users who might have converted.
Metrics for model evaluation
Once we understand the classification outcomes, we can use them to calculate various metrics that help evaluate the model’s performance:
Common pitfalls & how to avoid them
- Overfitting: Logistic regression models can overfit the data if too many predictors are included. Like linear regression, feature selection is important to improve model performance and prevent overfitting. In logistic regression, regularization techniques such as L1 (Lasso) and L2 (Ridge) are often used to shrink or eliminate less important features, helping the model generalize better to new data.
- Misinterpreting coefficients: In linear regression, coefficients represent a direct change in the dependent variable. In logistic regression, coefficients represent changes in log-odds, so converting them to odds ratios helps in interpretation.
- Imbalanced datasets: Logistic regression models can struggle with imbalanced datasets, where one class is much more common than the other. Metrics like precision, recall, and F1-score are more informative than accuracy in such cases.