Data Scientist Interview Questions

Review this list of 278 data scientist interview questions and answers verified by hiring managers and candidates.
  • "Outliers are data points that significantly deviate from the majority of the data distribution. They can arise due to various reasons, such as measurement errors, natural variability, or rare events. Outliers can distort statistical analyses and machine learning models, making it crucial to detect and handle them properly."

    Cesar F. - "Outliers are data points that significantly deviate from the majority of the data distribution. They can arise due to various reasons, such as measurement errors, natural variability, or rare events. Outliers can distort statistical analyses and machine learning models, making it crucial to detect and handle them properly."See full answer

    Data Scientist
    Statistics & Experimentation
  • Google logoAsked at Google 
    +2

    "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"

    Yashasvi V. - "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"See full answer

    Data Scientist
    Coding
    +3 more
  • Discord logoAsked at Discord 

    " A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation. In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"

    Scott S. - " A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation. In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"See full answer

    Data Scientist
    Behavioral
    +2 more
  • Swiggy logoAsked at Swiggy 

    "Swiggy could implement to increase the average order value (AOV) on its platform: 1. Smart Recommendations and Upselling: Personalized suggestions: Leverage data to recommend items based on past orders, popular choices, and trending items in the user's area. Upselling prompts: When a user adds an item to their cart, suggest related or higher-value items (e.g., "Would you like to add a side of fries with that?" or "Upgrade to a large for just ₹X more"). Bundle deals: Offer c"

    Harish K. - "Swiggy could implement to increase the average order value (AOV) on its platform: 1. Smart Recommendations and Upselling: Personalized suggestions: Leverage data to recommend items based on past orders, popular choices, and trending items in the user's area. Upselling prompts: When a user adds an item to their cart, suggest related or higher-value items (e.g., "Would you like to add a side of fries with that?" or "Upgrade to a large for just ₹X more"). Bundle deals: Offer c"See full answer

    Data Scientist
    Analytical
    +3 more
  • Data Scientist
    Data Analysis
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Adobe logoAsked at Adobe 
    +6

    " function climbStairs(n) { // 4 iterations of Dynamic Programming solutions: // Step 1: Recursive: // if (n <= 2) return n // return climbStairs(n-1) + climbStairs(n-2) // Step 2: Top-down Memoization // const memo = {0:0, 1:1, 2:2} // function f(x) { // if (x in memo) return memo[x] // memo[x] = f(x-1) + f(x-2) // return memo[x] // } // return f(n) // Step 3: Bottom-up Tabulation // const tab = [0,1,2] // f"

    Matthew K. - " function climbStairs(n) { // 4 iterations of Dynamic Programming solutions: // Step 1: Recursive: // if (n <= 2) return n // return climbStairs(n-1) + climbStairs(n-2) // Step 2: Top-down Memoization // const memo = {0:0, 1:1, 2:2} // function f(x) { // if (x in memo) return memo[x] // memo[x] = f(x-1) + f(x-2) // return memo[x] // } // return f(n) // Step 3: Bottom-up Tabulation // const tab = [0,1,2] // f"See full answer

    Data Scientist
    Data Structures & Algorithms
    +3 more
  • Adobe logoAsked at Adobe 
    Video answer for 'Move all zeros to the end of an array.'
    +39

    "this solution here is much faster than the exponent reference soln. It is also far more concise and easy to understand def moveZerosToEnd(arr: List[int]) -> List[int]: left = 0 for right in range(len(arr)): if arr[right] == 0: pass else: if left != right: temp = arr[left] arr[left] = arr[right] arr[right] = temp left += 1 return arr `"

    Devesh K. - "this solution here is much faster than the exponent reference soln. It is also far more concise and easy to understand def moveZerosToEnd(arr: List[int]) -> List[int]: left = 0 for right in range(len(arr)): if arr[right] == 0: pass else: if left != right: temp = arr[left] arr[left] = arr[right] arr[right] = temp left += 1 return arr `"See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • Data Scientist
    Statistics & Experimentation
  • Meta (Facebook) logoAsked at Meta (Facebook) 
    Video answer for 'Explain Bayes' theorem.'

    "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."

    Will I. - "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."See full answer

    Data Scientist
    Concept
    +2 more
  • +6

    "with cte as (select (count(postid)/count(userid)) as avgpost, avg(issuccessfulpost) as avgsuccess from post) select p.userid,sum(issuccessfulpost) as postsuccess,count(p.postid) as postattempt,ROUND(avg(issuccessfulpost),2) as postsuccessrate from post p,cte c on p.user_id group by p.user_id having postattempt>c.avgpost and postsuccessrate<c.avg_success order by postsuccessrate desc"

    Devanshu K. - "with cte as (select (count(postid)/count(userid)) as avgpost, avg(issuccessfulpost) as avgsuccess from post) select p.userid,sum(issuccessfulpost) as postsuccess,count(p.postid) as postattempt,ROUND(avg(issuccessfulpost),2) as postsuccessrate from post p,cte c on p.user_id group by p.user_id having postattempt>c.avgpost and postsuccessrate<c.avg_success order by postsuccessrate desc"See full answer

    Data Scientist
    Coding
    +1 more
  • Meta (Facebook) logoAsked at Meta (Facebook) 
    Data Scientist
    Analytical
  • Amazon logoAsked at Amazon 

    "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"

    Erjan G. - "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"See full answer

    Data Scientist
    Coding
    +3 more
  • Adobe logoAsked at Adobe 
    Video answer for 'Given the root of a binary tree of integers, return the maximum path sum.'

    "\# Definition for a binary tree node. class TreeNode: def init(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right class Solution: def maxPathSum(self, root: TreeNode) -> int: self.max_sum = float('-inf')"

    Jerry O. - "\# Definition for a binary tree node. class TreeNode: def init(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right class Solution: def maxPathSum(self, root: TreeNode) -> int: self.max_sum = float('-inf')"See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • +15

    "-- Write your query here With cte as ( Select employee_id , test_id , max(score) as maximum_scores from test_results group by 1,2 ) Select employee_id , e.name as employee_name , sum (maximumscores) as totalscore from cte c join employees e on c.employee_id = e.id group by 1,2 ORDER BY total_score desc `"

    Palak S. - "-- Write your query here With cte as ( Select employee_id , test_id , max(score) as maximum_scores from test_results group by 1,2 ) Select employee_id , e.name as employee_name , sum (maximumscores) as totalscore from cte c join employees e on c.employee_id = e.id group by 1,2 ORDER BY total_score desc `"See full answer

    Data Scientist
    Coding
    +1 more
  • +18

    "SELECT pro.id, pro.title, pro.budget, COUNT(employeeid) AS numemployees, SUM(e.salary) as total_salaries FROM projects pro JOIN employeesprojects ep ON ep.projectid = pro.id JOIN employees e ON e.id = ep.employee_id GROUP BY project_id; `"

    Zacharias E. - "SELECT pro.id, pro.title, pro.budget, COUNT(employeeid) AS numemployees, SUM(e.salary) as total_salaries FROM projects pro JOIN employeesprojects ep ON ep.projectid = pro.id JOIN employees e ON e.id = ep.employee_id GROUP BY project_id; `"See full answer

    Data Scientist
    Coding
    +1 more
  • Video answer for 'E-commerce (2 of 5)'
    +10

    "SELECT items.item_category, SUM(orders.orderquantity) AS totalunitsorderedlast7days FROM orders JOIN items ON orders.itemid = items.itemid WHERE orders.order_date BETWEEN DATE('now', '-6 days') AND DATE('now') GROUP BY items.item_category `"

    Salome L. - "SELECT items.item_category, SUM(orders.orderquantity) AS totalunitsorderedlast7days FROM orders JOIN items ON orders.itemid = items.itemid WHERE orders.order_date BETWEEN DATE('now', '-6 days') AND DATE('now') GROUP BY items.item_category `"See full answer

    Data Scientist
    Coding
    +1 more
  • Data Scientist
    Statistics & Experimentation
  • Amazon logoAsked at Amazon 
    Video answer for 'Implement k-means clustering.'
    Data Scientist
    Coding
    +4 more
  • Amazon logoAsked at Amazon 
    +2

    "Situation: COVID has impacted everyone's lives, especially small businesses. Earlier this year, during the second lockdown in Malaysia, it was estimated that 50%-70% of small businesses have closed. It got me thinking, beyond the existing training programmes, what can my company do to support small businesses? Task: So, I took the initiative to gather our Comms and Government Affairs team, to work together and explore how we can: 1) meaningfully demonstrate our company's commitment in"

    Judy W. - "Situation: COVID has impacted everyone's lives, especially small businesses. Earlier this year, during the second lockdown in Malaysia, it was estimated that 50%-70% of small businesses have closed. It got me thinking, beyond the existing training programmes, what can my company do to support small businesses? Task: So, I took the initiative to gather our Comms and Government Affairs team, to work together and explore how we can: 1) meaningfully demonstrate our company's commitment in"See full answer

    Data Scientist
    Behavioral
    +2 more
  • OpenAI logoAsked at OpenAI 
    Data Scientist
    Behavioral
    +5 more
Showing 81-100 of 278