Skip to main content

Top Data Engineer Interview Questions

Review this list of 167 Data Engineer interview questions and answers verified by hiring managers and candidates.
  • 25 answers
    +22

    "The user table no longer exists as expected - I get an error that user does not contain user_id. Note that querying the table results in only user:swuoevkivrjfta select * FROM user `"

    Evan R. - "The user table no longer exists as expected - I get an error that user does not contain user_id. Note that querying the table results in only user:swuoevkivrjfta select * FROM user `"See full answer

    Data Engineer
    Coding
    +3 more
  • 28 answers
    +21

    "Wording is a bit confusing on this one. This one can be interpreted as asking for count of unmatched bookings per user. By saying average here, what they really mean "what is the proportion of unmatched bookings to total bookings for each user." Important clarifying question to ask here if this was asked in a real interview!"

    Chase C. - "Wording is a bit confusing on this one. This one can be interpreted as asking for count of unmatched bookings per user. By saying average here, what they really mean "what is the proportion of unmatched bookings to total bookings for each user." Important clarifying question to ask here if this was asked in a real interview!"See full answer

    Data Engineer
    Coding
    +3 more
  • 22 answers
    +17

    "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "

    Victor N. - "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "See full answer

    Data Engineer
    Coding
    +3 more
  • Adobe logoAsked at Adobe 
    30 answers
    +22

    "This problem could be solved in two ways(both using Kadane's algorithm): Simple iterating 1-D dp function maxSubarraySum(nums) { const n = nums.length; if ( n === 0) return 0; const dp = Array(n).fill(0); dp[0] = nums[0]; for (let i = 1; i < n; i++) { dp[i] = Math.max(nums[i], dp[i - 1] + nums[i]); } return Math.max(...dp); } "

    Mark K. - "This problem could be solved in two ways(both using Kadane's algorithm): Simple iterating 1-D dp function maxSubarraySum(nums) { const n = nums.length; if ( n === 0) return 0; const dp = Array(n).fill(0); dp[0] = nums[0]; for (let i = 1; i < n; i++) { dp[i] = Math.max(nums[i], dp[i - 1] + nums[i]); } return Math.max(...dp); } "See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Apple logoAsked at Apple 
    24 answers
    +21

    "function isValid(s) { const stack = []; for (let i=0; i < s.length; i++) { const char = s.charAt(i); if (['(', '{', '['].includes(char)) { stack.push(char); } else { const top = stack.pop(); if ((char === ')' && top !== '(') || (char === '}' && top !== '{') || (char === ']' && top !== '[')) { return false; } } } return stack.length === 0"

    Tiago R. - "function isValid(s) { const stack = []; for (let i=0; i < s.length; i++) { const char = s.charAt(i); if (['(', '{', '['].includes(char)) { stack.push(char); } else { const top = stack.pop(); if ((char === ')' && top !== '(') || (char === '}' && top !== '{') || (char === ']' && top !== '[')) { return false; } } } return stack.length === 0"See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Adobe logoAsked at Adobe 
    4 answers
    +1

    " Compare alternate houses i.e for each house starting from the third, calculate the maximum money that can be stolen up to that house by choosing between: Skipping the current house and taking the maximum money stolen up to the previous house. Robbing the current house and adding its value to the maximum money stolen up to the house two steps back. package main import ( "fmt" ) // rob function calculates the maximum money a robber can steal func maxRob(nums []int) int { ln"

    VContaineers - " Compare alternate houses i.e for each house starting from the third, calculate the maximum money that can be stolen up to that house by choosing between: Skipping the current house and taking the maximum money stolen up to the previous house. Robbing the current house and adding its value to the maximum money stolen up to the house two steps back. package main import ( "fmt" ) // rob function calculates the maximum money a robber can steal func maxRob(nums []int) int { ln"See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Adobe logoAsked at Adobe 
    Add answer
    Video answer for 'Find the median of two sorted arrays.'
    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Google logoAsked at Google 
    5 answers
    +2

    "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"

    Yashasvi V. - "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"See full answer

    Data Engineer
    Coding
    +4 more
  • Databricks logoAsked at Databricks 
    3 answers

    "user table - with userid, username, email, phonenumber, accountcreateddate exercises table - types of exercises - indoor walk, outdoor walk, running, stairs, cycling, swimming etc - exerciseid, exercisetype date table - date, day, month, year - with dateid Session table - userid, sessiondateid(linked to dateid in date table), exerciseid, distance covered, calories spent, starttime, endtime "

    Anonymous Anteater - "user table - with userid, username, email, phonenumber, accountcreateddate exercises table - types of exercises - indoor walk, outdoor walk, running, stairs, cycling, swimming etc - exerciseid, exercisetype date table - date, day, month, year - with dateid Session table - userid, sessiondateid(linked to dateid in date table), exerciseid, distance covered, calories spent, starttime, endtime "See full answer

    Data Engineer
    Data Modeling
  • Visa logoAsked at Visa 
    3 answers

    "I generally struggle with stakeholders and partners who doesn't communicate enough. Now it could be either they don't invest sufficient time and energy in doing so or at times they lack the skill sets to do so. In both the cases, the entire responsibility fell on the other person to dig deep into why someone is doing the way they are doing, reading into patterns and behaviour of their personality and adapting to those communication styles"

    Lati K. - "I generally struggle with stakeholders and partners who doesn't communicate enough. Now it could be either they don't invest sufficient time and energy in doing so or at times they lack the skill sets to do so. In both the cases, the entire responsibility fell on the other person to dig deep into why someone is doing the way they are doing, reading into patterns and behaviour of their personality and adapting to those communication styles"See full answer

    Data Engineer
    Behavioral
    +2 more
  • Amazon logoAsked at Amazon 
    9 answers
    +6

    "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "

    Gabriele G. - "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Adobe logoAsked at Adobe 
    2 answers
    Video answer for 'Given the root of a binary tree of integers, return the maximum path sum.'

    "\# Definition for a binary tree node. class TreeNode: def init(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right class Solution: def maxPathSum(self, root: TreeNode) -> int: self.max_sum = float('-inf')"

    Jerry O. - "\# Definition for a binary tree node. class TreeNode: def init(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right class Solution: def maxPathSum(self, root: TreeNode) -> int: self.max_sum = float('-inf')"See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • 3 answers
    Video answer for 'Design a Data Warehouse Schema for a Ride-Sharing Service'

    "Firstly, congratulations to both the interviewer and interviewee. This was a great learning experience However, being a Full Stack engineer and I was having the following suggestions around the Data Model - Driver & Approval can be two different tables Approval & Document - Approval can be a tuple of (userid,documentid) - comments against a rejection (marks the document which triggers rejection)In this way we can capture the entire history of approval workflow (initiate/pending/appr"

    Nilanjan D. - "Firstly, congratulations to both the interviewer and interviewee. This was a great learning experience However, being a Full Stack engineer and I was having the following suggestions around the Data Model - Driver & Approval can be two different tables Approval & Document - Approval can be a tuple of (userid,documentid) - comments against a rejection (marks the document which triggers rejection)In this way we can capture the entire history of approval workflow (initiate/pending/appr"See full answer

    Data Engineer
    Data Modeling
  • Add answer
    Video answer for 'Design an ETL Pipeline for a ML Platform for AWS'
    Data Engineer
    Data Pipeline Design
  • Add answer
    Video answer for 'Analyze Monthly Customer Transactions'
    Data Engineer
    Coding
    +3 more
  • 1 answer
    Video answer for 'Design a data warehouse schema for Instagram.'
    Data Engineer
    Data Modeling
  • 28 answers
    +25

    "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"

    Daniel C. - "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"See full answer

    Data Engineer
    Coding
    +3 more
  • Amazon logoAsked at Amazon 
    6 answers

    "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"

    Erjan G. - "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"See full answer

    Data Engineer
    Coding
    +4 more
  • Add answer
    Video answer for 'Design Netflix's Clickstream Data Pipeline'
    Data Engineer
    Data Pipeline Design
    +1 more
  • TikTok logoAsked at TikTok 
    Add answer
    Video answer for 'Split an array into equal sum subarrays'
    Data Engineer
    Data Structures & Algorithms
    +1 more
Showing 41-60 of 167