Skip to main content

Calculate Conditional Probability (Bayes Theorem)

Premium

In this mock interview, a data scientist answers a question about conditional probability:

"Suppose a factory produces light bulbs and two machines, A and B are responsible for producing them. Machine A produces 60% of the bulbs while machine B produces the remaining 40%. Machine A produces defective bulbs at a rate of 5%, while machine B produces defective bulbs at a rate of 3%. If a randomly selected bulb is defective, what is the probability that it was produced by machine A?"

This is a numerical question. The interviewer is testing whether you are able to identify the correct statistical technique to be applied (Bayes Theorem), and are able to correctly implement it. They also want to understand your problem solving approach i.e. each step as you work through the problem.

Step 1: Define the problem

The problem statement here is to calculate the probability that a bulb is produced by Machine A, given that it is randomly selected and defective, i.e. conditional probability. Therefore, we should use the Bayes theorem statistical technique.

Step 2: Identify assumptions and variables

The main assumption is conditional independence. Bayes theorem assumes that the events or variables involved are conditionally independent. This means that the probability of one event occurring is not affected by the occurrence of another event, given the value of a third event.

Our variables are:

P(A) = Probability that Machine A produces the bulb = 60% = 0.6

P(B) = Probability that Machine B produces the bulb = 40% = 0.4

P(D) = Probability that a bulb is defective

P(D|A) = Probability that a bulb is defective given it was produced by A = 5% = 0.05

P(D|B) = Probability that a bulb is defective given it was produced by B = 3% = 0.03

P(A|D) = this is what we need to calculate

P(D) = P(D|A)P(A) + P(D|B) * P(B)

= 0.05 * 0.6 + 0.03 * 0.4

= 0.042

Step 3: Apply the statistical technique or formula

Bayes theorem:

P(A|B)P(B) = P(B|A)P(A)

Now, we’ll plug in the values:

P(A|D) = P(D|A) * P(A)/P(D)

= 0.05 * 0.6 / 0.042

= 5/7

Step 4: Check your work

Now, let’s double-check our calculations. One way to check the math is to calculate:

P(B|D) = P(D|B) * P(B)/P(D)

= 0.03 * 0.4 / 0.042

= 2/7

P(A|D) + P(B|D) = 5/7 + 2/7 = 1. This is correct because the bulb can only be produced by Machine A or Machine B. These probabilities should add to 1. This calculation validates our result above.

Step 5: Re-visit the problem scope

As defined previously, we need to calculate the probability that a bulb is produced by machine A, given that it is randomly selected and defective.

Given a randomly selected bulb is defective, the probability that it is produced by Machine A is 5/7. This makes sense, since Machine A produces defective bulbs at a higher rate of 5%, while Machine B produces defective bulbs at a rate of 3%.

Step 6: Check in with the interviewer

Be prepared to discuss your solution further if the interviewer has follow-up questions or wants to explore alternative approaches. Be open to feedback and constructive criticism. For example, you’re interviewer might ask, “What is an alternative approach to using Bayes theorem in this scenario?” You could explore simulation, and then explain why it’s not as suitable as Bayes theorem.*

Simulation can be used to estimate the required probability. However, there are some disadvantages to using simulation compared to directly applying Bayes' theorem:

  • Computational complexity: Simulation can be computationally intensive, especially for complex problems or scenarios with many variables, and thus requires more resources.
  • Accuracy: The accuracy of simulation results depends on the quality of the simulation model and the number of iterations performed. In this case, it may be challenging to accurately model the underlying process, leading to biased or unreliable results.