Data Engineer Interview Questions

Review this list of 154 data engineer interview questions and answers verified by hiring managers and candidates.
  • "Missing Item - User ordered multiple items, few items are missing Wrong Item - Entire order is wrong / there are items in the order that were never ordered How is this measured ? CSAT Missing Items Wrong Items Step 1 : Collect data on orders that reported missing / wrong items. Dive deep to understand if the problem is isolated to a specific metro/zip code/restaurant type (say fast food vs fine dine), time of day (lunch vs dinner), tenure of the courier on th"

    Saurabh K. - "Missing Item - User ordered multiple items, few items are missing Wrong Item - Entire order is wrong / there are items in the order that were never ordered How is this measured ? CSAT Missing Items Wrong Items Step 1 : Collect data on orders that reported missing / wrong items. Dive deep to understand if the problem is isolated to a specific metro/zip code/restaurant type (say fast food vs fine dine), time of day (lunch vs dinner), tenure of the courier on th"See full answer

    Data Engineer
    Statistics & Experimentation
  • Data Engineer
    Data Modeling
  • +1

    "This is yet another classic case of evolution of data landscape to account for diversities in the data formats sacrificing restrictive but key components at first and added later to make the solution more effective. Data warehouse -> Data Lake -> Data Lakehouse (Data Lake + Data Warehouse) Data warehouse - A solution to store data in central place (analytics (read) heavy) with stringent schema (structured). Very useful for historical queries and analytics. Schema on write check. Only used for"

    Karthik R. - "This is yet another classic case of evolution of data landscape to account for diversities in the data formats sacrificing restrictive but key components at first and added later to make the solution more effective. Data warehouse -> Data Lake -> Data Lakehouse (Data Lake + Data Warehouse) Data warehouse - A solution to store data in central place (analytics (read) heavy) with stringent schema (structured). Very useful for historical queries and analytics. Schema on write check. Only used for"See full answer

    Data Engineer
    Data Pipeline Design
  • +13

    "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "

    Victor N. - "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "See full answer

    Data Engineer
    Coding
    +3 more
  • +23

    "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"

    Daniel C. - "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"See full answer

    Data Engineer
    Coding
    +3 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • "What do all data scientists need to know about how to work with very large datasets? 37 Follow Request Answer More All related (39) Recommended 📷 Corrin Lakeland · Follow , M.S. Data Science, University of St. Thomas, St. Paul (2018)6yData Science consultant and managerUpvoted by[Tom Halloin](https://www.quora"

    Hayatu H. - "What do all data scientists need to know about how to work with very large datasets? 37 Follow Request Answer More All related (39) Recommended 📷 Corrin Lakeland · Follow , M.S. Data Science, University of St. Thomas, St. Paul (2018)6yData Science consultant and managerUpvoted by[Tom Halloin](https://www.quora"See full answer

    Data Engineer
    Data Modeling
  • Visa logoAsked at Visa 

    "There are couple of reasons for it - Kind of role : Its a product manager role loaded with analytical work, So working with data in stringent regulatory guideline make it more exciting and thrilling. Location & industry is like - Cherry on the cake, Bangalore weather and BFI is at its all time peak as people spending behavior is changing continuously, it will be interesting to see big giants like visa are managing it."

    Nidhi S. - "There are couple of reasons for it - Kind of role : Its a product manager role loaded with analytical work, So working with data in stringent regulatory guideline make it more exciting and thrilling. Location & industry is like - Cherry on the cake, Bangalore weather and BFI is at its all time peak as people spending behavior is changing continuously, it will be interesting to see big giants like visa are managing it."See full answer

    Data Engineer
    Behavioral
    +4 more
  • Adobe logoAsked at Adobe 
    Video answer for 'Product of Array Except Self'
    +51

    "If 0's aren't a concern, couldn't we just multiply all numbers. and then divide product by each number in the list ? if there's more than one zero, then we just return an array of 0s if there's one zero, then we just replace 0 with product and rest 0s. what am i missing?"

    Sachin R. - "If 0's aren't a concern, couldn't we just multiply all numbers. and then divide product by each number in the list ? if there's more than one zero, then we just return an array of 0s if there's one zero, then we just replace 0 with product and rest 0s. what am i missing?"See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • +14

    "The user table no longer exists as expected - I get an error that user does not contain user_id. Note that querying the table results in only user:swuoevkivrjfta select * FROM user `"

    Evan R. - "The user table no longer exists as expected - I get an error that user does not contain user_id. Note that querying the table results in only user:swuoevkivrjfta select * FROM user `"See full answer

    Data Engineer
    Coding
    +3 more
  • " A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation. In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"

    Scott S. - " A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation. In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"See full answer

    Data Engineer
    Behavioral
    +2 more
  • Google logoAsked at Google 
    +2

    "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"

    Yashasvi V. - "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"See full answer

    Data Engineer
    Coding
    +4 more
  • Adobe logoAsked at Adobe 
    +25

    "In python def find_duplicates(arr1: List[int], arr2: List[int]) -> List[int]: result = list(set(arr1) & set(arr2)) return result "

    Sammy R. - "In python def find_duplicates(arr1: List[int], arr2: List[int]) -> List[int]: result = list(set(arr1) & set(arr2)) return result "See full answer

    Data Engineer
    Data Structures & Algorithms
    +2 more
  • Adobe logoAsked at Adobe 
    Video answer for 'Given stock prices for the next n days, how can you maximize your profit by buying or selling one share per day?'
    +12

    "public static int maxProfitGreedy(int[] stockPrices) { int maxProfit = 0; for(int i = 1; i todayPrice) { maxProfit += tomorrowPrice - todayPrice; } } return maxProfit; } "

    Laksitha R. - "public static int maxProfitGreedy(int[] stockPrices) { int maxProfit = 0; for(int i = 1; i todayPrice) { maxProfit += tomorrowPrice - todayPrice; } } return maxProfit; } "See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • +10

    "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"

    Peter W. - "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"See full answer

    Data Engineer
    Coding
    +3 more
  • Adobe logoAsked at Adobe 
    +6

    " function climbStairs(n) { // 4 iterations of Dynamic Programming solutions: // Step 1: Recursive: // if (n <= 2) return n // return climbStairs(n-1) + climbStairs(n-2) // Step 2: Top-down Memoization // const memo = {0:0, 1:1, 2:2} // function f(x) { // if (x in memo) return memo[x] // memo[x] = f(x-1) + f(x-2) // return memo[x] // } // return f(n) // Step 3: Bottom-up Tabulation // const tab = [0,1,2] // f"

    Matthew K. - " function climbStairs(n) { // 4 iterations of Dynamic Programming solutions: // Step 1: Recursive: // if (n <= 2) return n // return climbStairs(n-1) + climbStairs(n-2) // Step 2: Top-down Memoization // const memo = {0:0, 1:1, 2:2} // function f(x) { // if (x in memo) return memo[x] // memo[x] = f(x-1) + f(x-2) // return memo[x] // } // return f(n) // Step 3: Bottom-up Tabulation // const tab = [0,1,2] // f"See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • Google logoAsked at Google 
    Video answer for 'Design a high-tech gym.'
    +1

    "[2:53 pm, 02/12/2021] Mayank: Before we deep dive into brainstorming the solution I go ahead and make few assumptions for clarifying questions: Step-1- Framing a problem 🎯Why users go to gym? To relieve stress by doing exercise To maintain their body To reduce their weight To remain active To make good physique 🎯Who goes to gym? Couples Group of friends Individuals Here we are trying to design high tech gym so it means we are looking to create good experience-and"

    Mayank S. - "[2:53 pm, 02/12/2021] Mayank: Before we deep dive into brainstorming the solution I go ahead and make few assumptions for clarifying questions: Step-1- Framing a problem 🎯Why users go to gym? To relieve stress by doing exercise To maintain their body To reduce their weight To remain active To make good physique 🎯Who goes to gym? Couples Group of friends Individuals Here we are trying to design high tech gym so it means we are looking to create good experience-and"See full answer

    Data Engineer
    Product Design
  • "Firstly, congratulations to both the interviewer and interviewee. This was a great learning experience However, being a Full Stack engineer and I was having the following suggestions around the Data Model - Driver & Approval can be two different tables Approval & Document - Approval can be a tuple of (userid,documentid) - comments against a rejection (marks the document which triggers rejection)In this way we can capture the entire history of approval workflow (initiate/pending/appr"

    Nilanjan D. - "Firstly, congratulations to both the interviewer and interviewee. This was a great learning experience However, being a Full Stack engineer and I was having the following suggestions around the Data Model - Driver & Approval can be two different tables Approval & Document - Approval can be a tuple of (userid,documentid) - comments against a rejection (marks the document which triggers rejection)In this way we can capture the entire history of approval workflow (initiate/pending/appr"See full answer

    Data Engineer
    Data Modeling
  • +21

    "with cte as (select ts.employee_id, e.name, t.id as test_id, max(distinct ts.score) as total from test_results as ts join tests as t on ts.test_id = t.id join employees as e on ts.employee_id = e.id group by ts.employee_id, e.name, t.id) select employee_id, name as employee_name, sum(total) as total_score from cte group by employee_id, employee_name order by total_score desc, employee_id asc ;"

    Christian B. - "with cte as (select ts.employee_id, e.name, t.id as test_id, max(distinct ts.score) as total from test_results as ts join tests as t on ts.test_id = t.id join employees as e on ts.employee_id = e.id group by ts.employee_id, e.name, t.id) select employee_id, name as employee_name, sum(total) as total_score from cte group by employee_id, employee_name order by total_score desc, employee_id asc ;"See full answer

    Data Engineer
    Coding
    +3 more
  • Adobe logoAsked at Adobe 
    Video answer for 'Find the median of two sorted arrays.'
    Data Engineer
    Data Structures & Algorithms
    +4 more
Showing 41-60 of 154