Data Scientist Statistics & Experimentation Interview Questions

Review this list of 11 statistics & experimentation data scientist interview questions and answers verified by hiring managers and candidates.
  • "Before proceeding, I just wanted to clarify we wanted to check for the impact of showing content from non-friends in users’ feeds, and here non-friends I would assume could be anyone, but mainly like content creators, and I am not including ads here. But I wanted to ask if there is any current logic as to what posts to show based on users' affinity to those posts, maybe basis the user engagement to Insta feed. now objective of this would be to improve the engagement of the platform, as if users"

    Dhruv S. - "Before proceeding, I just wanted to clarify we wanted to check for the impact of showing content from non-friends in users’ feeds, and here non-friends I would assume could be anyone, but mainly like content creators, and I am not including ads here. But I wanted to ask if there is any current logic as to what posts to show based on users' affinity to those posts, maybe basis the user engagement to Insta feed. now objective of this would be to improve the engagement of the platform, as if users"See full answer

    Data Scientist
    Statistics & Experimentation
  • "The distribution of daily search queries per user, as shown in the histogram, can be described as approximately normal (or bell-shaped) with a slight positive skew. Key Characteristics: Shape: The distribution is roughly symmetrical around its center, resembling a bell curve. This indicates that most users perform a moderate number of daily search queries. Central Tendency: The peak of the distribution, representing the highest density of users, appears to be around **8"

    Sam A. - "The distribution of daily search queries per user, as shown in the histogram, can be described as approximately normal (or bell-shaped) with a slight positive skew. Key Characteristics: Shape: The distribution is roughly symmetrical around its center, resembling a bell curve. This indicates that most users perform a moderate number of daily search queries. Central Tendency: The peak of the distribution, representing the highest density of users, appears to be around **8"See full answer

    Data Scientist
    Statistics & Experimentation
  • "Is there a reason a confidence interval was used to solve this problem over just using the mean/expected value directly?"

    Aarav G. - "Is there a reason a confidence interval was used to solve this problem over just using the mean/expected value directly?"See full answer

    Data Scientist
    Statistics & Experimentation
  • Microsoft logoAsked at Microsoft 

    "In the Transformer architecture, the decoder differs from the encoder primarily in its additional mechanisms designed to handle autoregressive sequence generation. Here's a breakdown of the key differences: Self-Attention Mechanism: Encoder: The encoder has a standard self-attention mechanism that allows each token to attend to all other tokens in the input sequence. Decoder: The decoder has two types of self-attention. The first is the same as in the encoder, but the second is mas"

    Ranj A. - "In the Transformer architecture, the decoder differs from the encoder primarily in its additional mechanisms designed to handle autoregressive sequence generation. Here's a breakdown of the key differences: Self-Attention Mechanism: Encoder: The encoder has a standard self-attention mechanism that allows each token to attend to all other tokens in the input sequence. Decoder: The decoder has two types of self-attention. The first is the same as in the encoder, but the second is mas"See full answer

    Data Scientist
    Statistics & Experimentation
  • Data Scientist
    Statistics & Experimentation
    +2 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Meta (Facebook) logoAsked at Meta (Facebook) 
    Video answer for 'Explain Bayes' theorem.'

    "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."

    Will I. - "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."See full answer

    Data Scientist
    Statistics & Experimentation
    +2 more
  • "I would use A/B testing to see if the new feature would be incrementally beneficial. To begin the testing, we should define what's the goal of this testing. Let's say the new feature would increase the average number of trade by X. Then randomly assign the clients to two groups, control and test group. Control group doesn't see the new feature and the test group see the new feature. We could also stratified sampling if we want to make sure cover different customer segmentation. During this desig"

    Jiin S. - "I would use A/B testing to see if the new feature would be incrementally beneficial. To begin the testing, we should define what's the goal of this testing. Let's say the new feature would increase the average number of trade by X. Then randomly assign the clients to two groups, control and test group. Control group doesn't see the new feature and the test group see the new feature. We could also stratified sampling if we want to make sure cover different customer segmentation. During this desig"See full answer

    Data Scientist
    Statistics & Experimentation
  • "I would conduct a sample z-test because we have enough samples and the population variance is known. H1: average monthly spending per user is $50 H0: average monthly spending per user is greater $50 One-sample z-test x_bar = $85 mu = $50 s = $20 n = 100 x_bar - mu / (s / sqrt(n) = 17.5 17.5 is the z-score that we will need to associate with its corresponding p-value. However, the z-score is very high, so the p-value will be very close to zero, which is much less than the standa"

    Lucas G. - "I would conduct a sample z-test because we have enough samples and the population variance is known. H1: average monthly spending per user is $50 H0: average monthly spending per user is greater $50 One-sample z-test x_bar = $85 mu = $50 s = $20 n = 100 x_bar - mu / (s / sqrt(n) = 17.5 17.5 is the z-score that we will need to associate with its corresponding p-value. However, the z-score is very high, so the p-value will be very close to zero, which is much less than the standa"See full answer

    Data Scientist
    Statistics & Experimentation
  • Data Scientist
    Statistics & Experimentation
  • McKinsey logoAsked at McKinsey 

    "The cases where data is under heavy outlier influence. Since mean fluctuates due to the presence of an outlier, median might be a better measure"

    Himani E. - "The cases where data is under heavy outlier influence. Since mean fluctuates due to the presence of an outlier, median might be a better measure"See full answer

    Data Scientist
    Statistics & Experimentation
  • "Probability that a coupon is used = P Probability that a coupon is not used = 1-P Probability that none of the N coupons are used = (1-P)^N Probability that at least one of the N coupons are used = 1 - (1-P)^N"

    Saurabh K. - "Probability that a coupon is used = P Probability that a coupon is not used = 1-P Probability that none of the N coupons are used = (1-P)^N Probability that at least one of the N coupons are used = 1 - (1-P)^N"See full answer

    Data Scientist
    Statistics & Experimentation
Showing 1-11 of 11