Skip to main content

How to Calculate Confidence Intervals

Premium

Questions about confidence intervals typically ask you to:

  1. Demonstrate conceptual understanding. Example: What does the confidence level mean when you’re building a confidence interval?
  2. Interpret confidence intervals. Example: How would you interpret a confidence interval that includes zero?”
  3. Estimate the confidence interval for a given population parameter, which is usually the population mean μ\mu in hypothesis testing.

This lesson will cover:

  • The relevance of confidence intervals in a data scientist role
  • How to calculate confidence intervals using analytical and bootstrapping techniques

Why confidence intervals matter

When sharing results from a hypothesis test or statistical analysis, stakeholders are often interested in the confidence interval along with the average point estimate, since this helps quantify uncertainty and risk, which aids in decision-making.

Confidence intervals are also important in statistical inference for estimating population parameters and assessing the uncertainty associated with those estimates.

Confidence interval is the range of values that you expect your estimate to fall between a certain percentage of the time if you run your experiment again or re-sample the population in the same way.

Confidence level (often denoted as 1−α\alpha) is the percentage of times you expect to reproduce an estimate between the upper and lower bounds of the confidence interval.

How to calculate confidence intervals

Two methods you can use to calculate a confidence interval include:

  1. Analytical
  2. Bootstrapping

Analytical

The analytical approach uses the formula:

Confidence Interval=(xˉ)±E\text{Confidence Interval} = (\bar{x})\pm E

Where

  • xˉ\bar{x} = sample mean
  • EE = margin of error

This approach makes the following assumptions:

  • the sample is drawn from a population that follows a normal distribution, or
  • the sample size is sufficiently large (Central Limit Theorem applies).

If the population standard deviation (σ\sigma) is known, use the z-distribution. If it is unknown and the sample size is small (n<30n<30), use the t-distribution.

Steps:

  1. Calculate the sample mean (xˉ\bar{x}) and sample standard deviation (ss).
  2. Determine the appropriate critical value from the standard normal distribution (z-distribution) or t-distribution based on the chosen confidence level and the degrees of freedom (if using the t-distribution).
  3. Calculate the margin of error (EE) using the formula:
E=critical value×snE = \text{critical value}\times \frac{s}{\sqrt{n}}
  1. Calculate the confidence interval using the formula:
Confidence Interval=(xˉ)±E\text{Confidence Interval} = (\bar{x})\pm E

Where

  • xˉ\bar{x} = sample mean
  • EE = margin of error

Bootstrapping

Bootstrapping is a resampling technique used to estimate the sampling distribution of a statistic by repeatedly resampling from the observed data. Bootstrapping requires fewer assumptions than the analytical method but is usually computationally more intensive.

If the population distribution is unknown or the sample size is small and the central limit theorem does not apply, bootstrapping provides a non-parametric alternative for estimating confidence intervals.

Steps:

  1. Randomly sample, with replacement, from the observed data to create bootstrap samples. Each bootstrap sample should have the same size as the original dataset. Repeat this process to create a large number of bootstrap samples (e.g. 1,000 or 10,000), which ensures robustness.
  2. For each bootstrap sample, calculate the statistic of interest. This could be the sample mean, sample median, sample proportion, or any other statistic relevant to your analysis. Repeat this calculation for each bootstrap sample to create a distribution of bootstrap statistics.
  3. Determine the desired confidence level for your confidence interval (e.g. 95%, 99%). Use the empirical quantiles of the bootstrap statistic distribution to calculate the confidence interval. To construct a 95% confidence interval, for example, find the 2.5% and 97.5% percentiles of the bootstrap statistic distribution. The interval between these percentiles constitutes the 95% confidence interval for the statistic of interest.