Skip to main content

Data Scientist Interview Questions

Review this list of 174 Data Scientist interview questions and answers verified by hiring managers and candidates.
  • +4

    "Step 1: Define Objectives and Key Metrics Objectives: Understand the demand for group video calling. Assess the potential impact on user engagement. Identify technical and user experience considerations. Key Metrics: Call Frequency: Number of 1:1 calls per user. Call Duration: Average duration of 1:1 calls. Call Participants: Identify users who frequently call multiple individuals. Concurrent Calls: Instances where users are engaged in multiple 1:1 call"

    Bhavna S. - "Step 1: Define Objectives and Key Metrics Objectives: Understand the demand for group video calling. Assess the potential impact on user engagement. Identify technical and user experience considerations. Key Metrics: Call Frequency: Number of 1:1 calls per user. Call Duration: Average duration of 1:1 calls. Call Participants: Identify users who frequently call multiple individuals. Concurrent Calls: Instances where users are engaged in multiple 1:1 call"See full answer

    Data Scientist
  • TikTok logoAsked at TikTok 
    5 answers
    Video answer for 'Define success for TikTok.'
    +2

    "Mission: Tiktok's mission is to inspire creativity and Joy. Any business wants to make sure that they are serving the value to their customers: For TikTok customers are: Viewers 2. Content Creators 3. Advertisers So few metrics we could measure are: Time spent/day Total no of videos created/day engagement rate = users who interacted in one of the meaningful action on Tiktok / total users at a day level either likes, share, watched vide for at least 5 mins, created video "

    Nikita B. - "Mission: Tiktok's mission is to inspire creativity and Joy. Any business wants to make sure that they are serving the value to their customers: For TikTok customers are: Viewers 2. Content Creators 3. Advertisers So few metrics we could measure are: Time spent/day Total no of videos created/day engagement rate = users who interacted in one of the meaningful action on Tiktok / total users at a day level either likes, share, watched vide for at least 5 mins, created video "See full answer

    Data Scientist
    Analytical
    +1 more
  • 22 answers
    +17

    "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "

    Victor N. - "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "See full answer

    Data Scientist
    Coding
    +3 more
  • Adobe logoAsked at Adobe 
    66 answers
    Video answer for 'Move all zeros to the end of an array.'
    +61

    "Initialize left pointer: Set a left pointer left to 0. Iterate through the array: Iterate through the array from left to right. If the current element is not 0, swap it with the element at the left pointer and increment left. Time complexity: O(n). The loop iterates through the entire array once, making it linear time. Space complexity: O(1). The algorithm operates in-place, modifying the input array directly without using additional data structures. "

    Avon T. - "Initialize left pointer: Set a left pointer left to 0. Iterate through the array: Iterate through the array from left to right. If the current element is not 0, swap it with the element at the left pointer and increment left. Time complexity: O(n). The loop iterates through the entire array once, making it linear time. Space complexity: O(1). The algorithm operates in-place, modifying the input array directly without using additional data structures. "See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • Amazon logoAsked at Amazon 
    9 answers
    +6

    "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "

    Gabriele G. - "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Adobe logoAsked at Adobe 
    19 answers
    Video answer for 'Given stock prices for the next n days, how can you maximize your profit by buying or selling one share per day?'
    +14

    "public static int maxProfitGreedy(int[] stockPrices) { int maxProfit = 0; for(int i = 1; i todayPrice) { maxProfit += tomorrowPrice - todayPrice; } } return maxProfit; } "

    Laksitha R. - "public static int maxProfitGreedy(int[] stockPrices) { int maxProfit = 0; for(int i = 1; i todayPrice) { maxProfit += tomorrowPrice - todayPrice; } } return maxProfit; } "See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • Lyft logoAsked at Lyft 
    7 answers
    Video answer for 'A $5 discount coupon is given to N riders. The probability of using a coupon is P. What is the expected cost for the company?'
    +4

    "Is there a reason a confidence interval was used to solve this problem over just using the mean/expected value directly?"

    Aarav G. - "Is there a reason a confidence interval was used to solve this problem over just using the mean/expected value directly?"See full answer

    Data Scientist
    Statistics & Experimentation
  • Meta logoAsked at Meta 
    2 answers

    "Clarifying Questions and possible responses: both audio and video goals: increase engagement time among groups/communitites and not require another platform to do group call (be one-stop for communication) region-TBD ios/android only available to users in a group to call users within the group who can intitiate these calls?- only admin? or anyone? metrics:NSM: feature engagement (C), number of calls made in a week per user (C). PM: % of people joining the call in a group"

    theproductguy - "Clarifying Questions and possible responses: both audio and video goals: increase engagement time among groups/communitites and not require another platform to do group call (be one-stop for communication) region-TBD ios/android only available to users in a group to call users within the group who can intitiate these calls?- only admin? or anyone? metrics:NSM: feature engagement (C), number of calls made in a week per user (C). PM: % of people joining the call in a group"See full answer

    Data Scientist
    Data Analysis
    +2 more
  • OpenAI logoAsked at OpenAI 
    Add answer
    Data Scientist
    Behavioral
    +6 more
  • Google logoAsked at Google 
    2 answers

    "Precision - Out of all the things we picked as correct, how many were actually correct? recall - Out of all the things that were truly correct, how many did we actually find?"

    Vineet M. - "Precision - Out of all the things we picked as correct, how many were actually correct? recall - Out of all the things that were truly correct, how many did we actually find?"See full answer

    Data Scientist
    Statistics & Experimentation
  • 28 answers
    +25

    "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"

    Daniel C. - "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"See full answer

    Data Scientist
    Coding
    +3 more
  • Amazon logoAsked at Amazon 
    2 answers
    Video answer for 'Implement k-means clustering.'

    "at first I want to know number of cluster I will put random number if I don't know and I will use method called Elbow method or Silhouette Score ,Gap Statistic and Davies–Bouldin Index to know the best number of cluster and I will use scikit-learn library to import kmeans from sklearn.cluster import KMeans kmeans = KMeans(nclusters=2, randomstate=0) kmeans.fit(X) and X this my data "

    Taheia S. - "at first I want to know number of cluster I will put random number if I don't know and I will use method called Elbow method or Silhouette Score ,Gap Statistic and Davies–Bouldin Index to know the best number of cluster and I will use scikit-learn library to import kmeans from sklearn.cluster import KMeans kmeans = KMeans(nclusters=2, randomstate=0) kmeans.fit(X) and X this my data "See full answer

    Data Scientist
    Analytical
    +5 more
  • Discord logoAsked at Discord 
    2 answers

    " A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation. In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"

    Scott S. - " A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation. In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"See full answer

    Data Scientist
    Behavioral
    +2 more
  • Goldman Sachs logoAsked at Goldman Sachs 
    4 answers
    Video answer for 'Explain Bayes' theorem.'
    +1

    "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."

    Will I. - "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."See full answer

    Data Scientist
    Concept
    +2 more
  • Adobe logoAsked at Adobe 
    15 answers
    +10

    " function climbStairs(n) { // 4 iterations of Dynamic Programming solutions: // Step 1: Recursive: // if (n <= 2) return n // return climbStairs(n-1) + climbStairs(n-2) // Step 2: Top-down Memoization // const memo = {0:0, 1:1, 2:2} // function f(x) { // if (x in memo) return memo[x] // memo[x] = f(x-1) + f(x-2) // return memo[x] // } // return f(n) // Step 3: Bottom-up Tabulation // const tab = [0,1,2] // f"

    Matthew K. - " function climbStairs(n) { // 4 iterations of Dynamic Programming solutions: // Step 1: Recursive: // if (n <= 2) return n // return climbStairs(n-1) + climbStairs(n-2) // Step 2: Top-down Memoization // const memo = {0:0, 1:1, 2:2} // function f(x) { // if (x in memo) return memo[x] // memo[x] = f(x-1) + f(x-2) // return memo[x] // } // return f(n) // Step 3: Bottom-up Tabulation // const tab = [0,1,2] // f"See full answer

    Data Scientist
    Data Structures & Algorithms
    +3 more
  • 14 answers
    +11

    "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"

    Peter W. - "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"See full answer

    Data Scientist
    Coding
    +3 more
  • Google logoAsked at Google 
    5 answers
    +2

    "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"

    Yashasvi V. - "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"See full answer

    Data Scientist
    Coding
    +4 more
  • Visa logoAsked at Visa 
    1 answer

    "There are couple of reasons for it - Kind of role : Its a product manager role loaded with analytical work, So working with data in stringent regulatory guideline make it more exciting and thrilling. Location & industry is like - Cherry on the cake, Bangalore weather and BFI is at its all time peak as people spending behavior is changing continuously, it will be interesting to see big giants like visa are managing it."

    Nidhi S. - "There are couple of reasons for it - Kind of role : Its a product manager role loaded with analytical work, So working with data in stringent regulatory guideline make it more exciting and thrilling. Location & industry is like - Cherry on the cake, Bangalore weather and BFI is at its all time peak as people spending behavior is changing continuously, it will be interesting to see big giants like visa are managing it."See full answer

    Data Scientist
    Behavioral
    +4 more
  • 27 answers
    +24

    "with cte as (select ts.employee_id, e.name, t.id as test_id, max(distinct ts.score) as total from test_results as ts join tests as t on ts.test_id = t.id join employees as e on ts.employee_id = e.id group by ts.employee_id, e.name, t.id) select employee_id, name as employee_name, sum(total) as total_score from cte group by employee_id, employee_name order by total_score desc, employee_id asc ;"

    Christian B. - "with cte as (select ts.employee_id, e.name, t.id as test_id, max(distinct ts.score) as total from test_results as ts join tests as t on ts.test_id = t.id join employees as e on ts.employee_id = e.id group by ts.employee_id, e.name, t.id) select employee_id, name as employee_name, sum(total) as total_score from cte group by employee_id, employee_name order by total_score desc, employee_id asc ;"See full answer

    Data Scientist
    Coding
    +3 more
  • OpenAI logoAsked at OpenAI 
    Add answer
    Data Scientist
    Statistics & Experimentation
Showing 41-60 of 174