Define Confidence Level in Confidence Interval
In this mock interview, a data scientist answers the question, "What does the confidence level mean when building a confidence interval?"
This is a conceptual question. The interviewer is testing your understanding of the concept, as well as your ability to communicate clearly (ideally illustrated with an example).
Let's start with the technical definition.
- When you make an estimate in statistics, whether it is a summary statistic or a test statistic, there is always uncertainty around that estimate because the number is based on a sample of the population you are studying.
- The confidence interval is the range of values that you expect your estimate to fall between a certain percentage of the time if you run your experiment again or re-sample the population in the same way.
- The confidence level is the percentage of times you expect to reproduce an estimate between the upper and lower bounds of the confidence interval, and is set as 1 -
For example, if you construct a confidence interval with a 95% confidence level, you are confident that 95 out of 100 times the estimate will fall between the upper and lower values specified by the confidence interval.
Suppose we want to estimate the average height of adult males in a city. We take a random sample of 2,000 males and calculate a 95% confidence interval for the population mean height. If the interval is 170 cm to 180 cm, it means if we repeat this sampling 100 times (each time with 2,000 males), 95 times the mean will be within the 170 to 180 cm range.
Let's say your interviewer asks you the follow up question, "How does the confidence level affect the width of the confidence interval?"
You could say, "Typically, there is a trade-off between the width of the confidence interval and the confidence level. A higher confidence level leads to a wider interval because we are more certain that the parameter of interest lies within that range. Conversely, a lower confidence level results in a narrower interval, but it also means we are less confident that the interval contains the parameter of interest."