Data Scientist Interview Questions

Review this list of 96 data scientist interview questions and answers verified by hiring managers and candidates.
  • "Before proceeding, I just wanted to clarify we wanted to check for the impact of showing content from non-friends in users’ feeds, and here non-friends I would assume could be anyone, but mainly like content creators, and I am not including ads here. But I wanted to ask if there is any current logic as to what posts to show based on users' affinity to those posts, maybe basis the user engagement to Insta feed. now objective of this would be to improve the engagement of the platform, as if users"

    Dhruv S. - "Before proceeding, I just wanted to clarify we wanted to check for the impact of showing content from non-friends in users’ feeds, and here non-friends I would assume could be anyone, but mainly like content creators, and I am not including ads here. But I wanted to ask if there is any current logic as to what posts to show based on users' affinity to those posts, maybe basis the user engagement to Insta feed. now objective of this would be to improve the engagement of the platform, as if users"See full answer

    Data Scientist
    Statistics & Experimentation
  • "Define: How is daily post view calculated Isolate Issues: Data issue Time period Geo IOS vs Android vs Web Correlated Metrics in the funnel DAU Time spent/ scrolls Engagement - likes, comments External factors Competitor actions Big events Internal factors Product launch Feature change"

    Steve Y. - "Define: How is daily post view calculated Isolate Issues: Data issue Time period Geo IOS vs Android vs Web Correlated Metrics in the funnel DAU Time spent/ scrolls Engagement - likes, comments External factors Competitor actions Big events Internal factors Product launch Feature change"See full answer

    Data Scientist
    Data Analysis
  • "Daily search queries per user is expected to be skewed, such that a long tail of users have higher than average number of queries per user while a large majority of the users have fewer queries. The distribution is likely to be right skewed."

    Saurabh K. - "Daily search queries per user is expected to be skewed, such that a long tail of users have higher than average number of queries per user while a large majority of the users have fewer queries. The distribution is likely to be right skewed."See full answer

    Data Scientist
    Statistics & Experimentation
  • Video answer for 'SQL Stored Procedures'

    "it is really good explanation thanks it is really good explanation thanks"

    Amney M. - "it is really good explanation thanks it is really good explanation thanks"See full answer

    Data Scientist
    Coding
    +2 more
  • "Product Understanding - Ads are what you see from companies as stories, posts, reels. Post are from users (connections). We have to design an experience which produces maximum engagement while generating ad revenue. Clarifying Questions - Is it specific to posts/stories/reels ? Is there an existing post to ads ratio or do we have to start from scratch? Is it specific to a device/OS? Is it specific to a region/user demographic? Assumption - Existing posts to ads ratio"

    Vishal S. - "Product Understanding - Ads are what you see from companies as stories, posts, reels. Post are from users (connections). We have to design an experience which produces maximum engagement while generating ad revenue. Clarifying Questions - Is it specific to posts/stories/reels ? Is there an existing post to ads ratio or do we have to start from scratch? Is it specific to a device/OS? Is it specific to a region/user demographic? Assumption - Existing posts to ads ratio"See full answer

    Data Scientist
    Data Analysis
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • "To model ROI for a product launch, the first step is to define the timeline you're targeting Example 6 months post-launch, 1 year, or even 5 years. Tip: Start with a 1-year ROI projection to estimate near-term returns, and build a 3-year projection to evaluate growth and scalability. ROI is essentially the net return over that period: Profit=Revenue (within timeline)−Total Cost (from project start) Total Cost includes both fixed and variable costs incurred since t"

    Himanshu G. - "To model ROI for a product launch, the first step is to define the timeline you're targeting Example 6 months post-launch, 1 year, or even 5 years. Tip: Start with a 1-year ROI projection to estimate near-term returns, and build a 3-year projection to evaluate growth and scalability. ROI is essentially the net return over that period: Profit=Revenue (within timeline)−Total Cost (from project start) Total Cost includes both fixed and variable costs incurred since t"See full answer

    Data Scientist
    Data Analysis
    +3 more
  • DoorDash logoAsked at DoorDash 

    "Missing Item - User ordered multiple items, few items are missing Wrong Item - Entire order is wrong / there are items in the order that were never ordered How is this measured ? CSAT Missing Items Wrong Items Step 1 : Collect data on orders that reported missing / wrong items. Dive deep to understand if the problem is isolated to a specific metro/zip code/restaurant type (say fast food vs fine dine), time of day (lunch vs dinner), tenure of the courier on th"

    Saurabh K. - "Missing Item - User ordered multiple items, few items are missing Wrong Item - Entire order is wrong / there are items in the order that were never ordered How is this measured ? CSAT Missing Items Wrong Items Step 1 : Collect data on orders that reported missing / wrong items. Dive deep to understand if the problem is isolated to a specific metro/zip code/restaurant type (say fast food vs fine dine), time of day (lunch vs dinner), tenure of the courier on th"See full answer

    Data Scientist
    Statistics & Experimentation
    +1 more
  • Meta (Facebook) logoAsked at Meta (Facebook) 

    "Clarifying Questions and possible responses: both audio and video goals: increase engagement time among groups/communitites and not require another platform to do group call (be one-stop for communication) region-TBD ios/android only available to users in a group to call users within the group who can intitiate these calls?- only admin? or anyone? metrics:NSM: feature engagement (C), number of calls made in a week per user (C). PM: % of people joining the call in a group"

    theproductguy - "Clarifying Questions and possible responses: both audio and video goals: increase engagement time among groups/communitites and not require another platform to do group call (be one-stop for communication) region-TBD ios/android only available to users in a group to call users within the group who can intitiate these calls?- only admin? or anyone? metrics:NSM: feature engagement (C), number of calls made in a week per user (C). PM: % of people joining the call in a group"See full answer

    Data Scientist
    Data Analysis
    +3 more
  • "Is there a reason a confidence interval was used to solve this problem over just using the mean/expected value directly?"

    Aarav G. - "Is there a reason a confidence interval was used to solve this problem over just using the mean/expected value directly?"See full answer

    Data Scientist
    Statistics & Experimentation
  • Microsoft logoAsked at Microsoft 

    "In the Transformer architecture, the decoder differs from the encoder primarily in its additional mechanisms designed to handle autoregressive sequence generation. Here's a breakdown of the key differences: Self-Attention Mechanism: Encoder: The encoder has a standard self-attention mechanism that allows each token to attend to all other tokens in the input sequence. Decoder: The decoder has two types of self-attention. The first is the same as in the encoder, but the second is mas"

    Ranj A. - "In the Transformer architecture, the decoder differs from the encoder primarily in its additional mechanisms designed to handle autoregressive sequence generation. Here's a breakdown of the key differences: Self-Attention Mechanism: Encoder: The encoder has a standard self-attention mechanism that allows each token to attend to all other tokens in the input sequence. Decoder: The decoder has two types of self-attention. The first is the same as in the encoder, but the second is mas"See full answer

    Data Scientist
    Statistics & Experimentation
  • "P(A) = 0.6 P(B) = 0.4 P(D|A) = 0.05 P(D|B) = 0.03 Question asks to solve for P(A|D) P(A|D) = (P(D|A) x P(A))/P(D) = (0.05 x 0.6)/(P(D|A) x P(A) + P(D|B) x P(B)) = (0.05 x 0.6)/(0.05 x 0.6+0.03 x 0.4) = 30/42 = 5/7 = 0.714 Notice above that P(D) = P(D|A) x P(A) + P(D|B) x P (B)"

    Saurabh K. - "P(A) = 0.6 P(B) = 0.4 P(D|A) = 0.05 P(D|B) = 0.03 Question asks to solve for P(A|D) P(A|D) = (P(D|A) x P(A))/P(D) = (0.05 x 0.6)/(P(D|A) x P(A) + P(D|B) x P(B)) = (0.05 x 0.6)/(0.05 x 0.6+0.03 x 0.4) = 30/42 = 5/7 = 0.714 Notice above that P(D) = P(D|A) x P(A) + P(D|B) x P (B)"See full answer

    Data Scientist
    Statistics & Experimentation
  • "I would use A/B testing to see if the new feature would be incrementally beneficial. To begin the testing, we should define what's the goal of this testing. Let's say the new feature would increase the average number of trade by X. Then randomly assign the clients to two groups, control and test group. Control group doesn't see the new feature and the test group see the new feature. We could also stratified sampling if we want to make sure cover different customer segmentation. During this desig"

    Jiin S. - "I would use A/B testing to see if the new feature would be incrementally beneficial. To begin the testing, we should define what's the goal of this testing. Let's say the new feature would increase the average number of trade by X. Then randomly assign the clients to two groups, control and test group. Control group doesn't see the new feature and the test group see the new feature. We could also stratified sampling if we want to make sure cover different customer segmentation. During this desig"See full answer

    Data Scientist
    Statistics & Experimentation
  • "Outliers are data points that significantly deviate from the majority of the data distribution. They can arise due to various reasons, such as measurement errors, natural variability, or rare events. Outliers can distort statistical analyses and machine learning models, making it crucial to detect and handle them properly."

    Cesar F. - "Outliers are data points that significantly deviate from the majority of the data distribution. They can arise due to various reasons, such as measurement errors, natural variability, or rare events. Outliers can distort statistical analyses and machine learning models, making it crucial to detect and handle them properly."See full answer

    Data Scientist
    Statistics & Experimentation
  • Data Scientist
    Data Analysis
  • Data Scientist
    Statistics & Experimentation
  • Meta (Facebook) logoAsked at Meta (Facebook) 
    Video answer for 'Explain Bayes' theorem.'

    "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."

    Will I. - "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."See full answer

    Data Scientist
    Concept
    +2 more
  • Data Scientist
    Statistics & Experimentation
  • Amazon logoAsked at Amazon 

    "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"

    Erjan G. - "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"See full answer

    Data Scientist
    Coding
    +3 more
  • Amazon logoAsked at Amazon 
    Video answer for 'Implement k-means clustering.'
    Data Scientist
    Coding
    +4 more
Showing 1-20 of 96