False Positive Distribution
Question: If you sample 10,000 users multiple times, what would the distribution of false positives look like?
Options: (1) Exponential, (2) Normal, (3) Binomial, (4) None of the above
A false positive refers to rejecting the null hypothesis when it is actually true. In hypothesis testing, the false positive rate is denoted by alpha (α), typically set at 5% or 0.05. Assuming the population distribution can be normal, uniform, or any other distribution, and each sample consists of 10,000 independent users, the number of false positives in a single sample follows a binomial distribution. This is because each user represents an independent trial with a constant probability of producing a false positive.
If we repeat the sampling of 10,000 users multiple times, the Central Limit Theorem (CLT) tells us that the distribution of the number of false positives across samples will approximate a normal distribution. This is due to the large sample size. The binomial distribution with parameters n = 10,000 and p = 0.05 has a mean of np = 500 and a variance of np(1-p) = 475. Therefore, over many repeated samples, the false positives will be normally distributed around a mean of 500.