Skip to main content

Data Engineer Interview Questions

Review this list of 170 Data Engineer interview questions and answers verified by hiring managers and candidates.
  • Accenture logoAsked at Accenture 
    5 answers
    Video answer for 'Design an AI data product.'
    +2

    "Understand the business problem: Identify the business problem that the AI data product is intended to solve. Identify the target audience: Understand who will be using the data and what problem they will be solving for using the data. This will inform the features and functionality that should be included in the product. Gather and preprocess the data: Collect and preprocess the data that is relevant to the problem that it is being solved for. This will inform the AI algorithm"

    M D. - "Understand the business problem: Identify the business problem that the AI data product is intended to solve. Identify the target audience: Understand who will be using the data and what problem they will be solving for using the data. This will inform the features and functionality that should be included in the product. Gather and preprocess the data: Collect and preprocess the data that is relevant to the problem that it is being solved for. This will inform the AI algorithm"See full answer

    Data Engineer
    Product Design
    +2 more
  • 28 answers
    +21

    "SELECT u.user_id, u.user_name, u.email, ROUND(AVG(CASE WHEN b.status = 'Unmatched' THEN 1.0 ELSE 0 END), 2) AS avgunmatchedbookings FROM users u LEFT JOIN bookings b ON u.userid = b.userid GROUP BY u.user_id, u.user_name, u.email; `"

    Akshay D. - "SELECT u.user_id, u.user_name, u.email, ROUND(AVG(CASE WHEN b.status = 'Unmatched' THEN 1.0 ELSE 0 END), 2) AS avgunmatchedbookings FROM users u LEFT JOIN bookings b ON u.userid = b.userid GROUP BY u.user_id, u.user_name, u.email; `"See full answer

    Data Engineer
    Coding
    +3 more
  • 22 answers
    +17

    "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "

    Victor N. - "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "See full answer

    Data Engineer
    Coding
    +3 more
  • Accenture logoAsked at Accenture 
    4 answers

    "One weakness of mine is reaching stuff through higher shelves but I'm able to use a step ladder to address the situation."

    Amparo L. - "One weakness of mine is reaching stuff through higher shelves but I'm able to use a step ladder to address the situation."See full answer

    Data Engineer
    Behavioral
    +2 more
  • Adobe logoAsked at Adobe 
    69 answers
    Video answer for 'Move all zeros to the end of an array.'
    +64

    "Initialize left pointer: Set a left pointer left to 0. Iterate through the array: Iterate through the array from left to right. If the current element is not 0, swap it with the element at the left pointer and increment left. Time complexity: O(n). The loop iterates through the entire array once, making it linear time. Space complexity: O(1). The algorithm operates in-place, modifying the input array directly without using additional data structures. "

    Avon T. - "Initialize left pointer: Set a left pointer left to 0. Iterate through the array: Iterate through the array from left to right. If the current element is not 0, swap it with the element at the left pointer and increment left. Time complexity: O(n). The loop iterates through the entire array once, making it linear time. Space complexity: O(1). The algorithm operates in-place, modifying the input array directly without using additional data structures. "See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Adobe logoAsked at Adobe 
    19 answers
    Video answer for 'Given stock prices for the next n days, how can you maximize your profit by buying or selling one share per day?'
    +14

    "public static int maxProfitGreedy(int[] stockPrices) { int maxProfit = 0; for(int i = 1; i todayPrice) { maxProfit += tomorrowPrice - todayPrice; } } return maxProfit; } "

    Laksitha R. - "public static int maxProfitGreedy(int[] stockPrices) { int maxProfit = 0; for(int i = 1; i todayPrice) { maxProfit += tomorrowPrice - todayPrice; } } return maxProfit; } "See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Amazon logoAsked at Amazon 
    9 answers
    +6

    "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "

    Gabriele G. - "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Adobe logoAsked at Adobe 
    66 answers
    Video answer for 'Product of Array Except Self'
    +60

    "If 0's aren't a concern, couldn't we just multiply all numbers. and then divide product by each number in the list ? if there's more than one zero, then we just return an array of 0s if there's one zero, then we just replace 0 with product and rest 0s. what am i missing?"

    Sachin R. - "If 0's aren't a concern, couldn't we just multiply all numbers. and then divide product by each number in the list ? if there's more than one zero, then we just return an array of 0s if there's one zero, then we just replace 0 with product and rest 0s. what am i missing?"See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • OpenAI logoAsked at OpenAI 
    Add answer
    Data Engineer
    Behavioral
    +6 more
  • 28 answers
    +25

    "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"

    Daniel C. - "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"See full answer

    Data Engineer
    Coding
    +3 more
  • Adobe logoAsked at Adobe 
    34 answers
    +30

    " from typing import List one pass O(n) def find_duplicates(arr1: List[int], arr2: List[int]) -> List[int]: duplicates = [] i1 = i2 = 0 while i1 < len(arr1) and i2 < len(arr2): if arr1[i1] == arr2[i2]: duplicates.append(arr1[i1]) i2 += 1 i1 += 1 return duplicates debug your code below print(find_duplicates([1, 2, 3, 5, 6, 7], [3, 6, 7, 8, 20])) `"

    Rick E. - " from typing import List one pass O(n) def find_duplicates(arr1: List[int], arr2: List[int]) -> List[int]: duplicates = [] i1 = i2 = 0 while i1 < len(arr1) and i2 < len(arr2): if arr1[i1] == arr2[i2]: duplicates.append(arr1[i1]) i2 += 1 i1 += 1 return duplicates debug your code below print(find_duplicates([1, 2, 3, 5, 6, 7], [3, 6, 7, 8, 20])) `"See full answer

    Data Engineer
    Data Structures & Algorithms
    +2 more
  • Databricks logoAsked at Databricks 
    5 answers
    +2

    "This is yet another classic case of evolution of data landscape to account for diversities in the data formats sacrificing restrictive but key components at first and added later to make the solution more effective. Data warehouse -> Data Lake -> Data Lakehouse (Data Lake + Data Warehouse) Data warehouse - A solution to store data in central place (analytics (read) heavy) with stringent schema (structured). Very useful for historical queries and analytics. Schema on write check. Only used for"

    Karthik R. - "This is yet another classic case of evolution of data landscape to account for diversities in the data formats sacrificing restrictive but key components at first and added later to make the solution more effective. Data warehouse -> Data Lake -> Data Lakehouse (Data Lake + Data Warehouse) Data warehouse - A solution to store data in central place (analytics (read) heavy) with stringent schema (structured). Very useful for historical queries and analytics. Schema on write check. Only used for"See full answer

    Data Engineer
    Data Pipeline Design
  • Discord logoAsked at Discord 
    2 answers

    " A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation. In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"

    Scott S. - " A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation. In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"See full answer

    Data Engineer
    Behavioral
    +2 more
  • "How do you find consecutive days for login (MySQL, SQL, date, subquery, MySQL 5.7, development)? 1 Follow Request Answer More All related (34) Recommended 📷 Trausti Thor Johannsson · Follow Been using MySQL for more than 16 yearsDec 27 There are functions like DATEDIFF but there are also BETWE"

    Hayatu H. - "How do you find consecutive days for login (MySQL, SQL, date, subquery, MySQL 5.7, development)? 1 Follow Request Answer More All related (34) Recommended 📷 Trausti Thor Johannsson · Follow Been using MySQL for more than 16 yearsDec 27 There are functions like DATEDIFF but there are also BETWE"See full answer

    Data Engineer
    Coding
    +1 more
  • Adobe logoAsked at Adobe 
    15 answers
    +10

    " function climbStairs(n) { // 4 iterations of Dynamic Programming solutions: // Step 1: Recursive: // if (n <= 2) return n // return climbStairs(n-1) + climbStairs(n-2) // Step 2: Top-down Memoization // const memo = {0:0, 1:1, 2:2} // function f(x) { // if (x in memo) return memo[x] // memo[x] = f(x-1) + f(x-2) // return memo[x] // } // return f(n) // Step 3: Bottom-up Tabulation // const tab = [0,1,2] // f"

    Matthew K. - " function climbStairs(n) { // 4 iterations of Dynamic Programming solutions: // Step 1: Recursive: // if (n <= 2) return n // return climbStairs(n-1) + climbStairs(n-2) // Step 2: Top-down Memoization // const memo = {0:0, 1:1, 2:2} // function f(x) { // if (x in memo) return memo[x] // memo[x] = f(x-1) + f(x-2) // return memo[x] // } // return f(n) // Step 3: Bottom-up Tabulation // const tab = [0,1,2] // f"See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • 14 answers
    +11

    "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"

    Peter W. - "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"See full answer

    Data Engineer
    Coding
    +3 more
  • Google logoAsked at Google 
    5 answers
    +2

    "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"

    Yashasvi V. - "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"See full answer

    Data Engineer
    Coding
    +4 more
  • Google logoAsked at Google 
    27 answers
    +24

    "with cte as (select ts.employee_id, e.name, t.id as test_id, max(distinct ts.score) as total from test_results as ts join tests as t on ts.test_id = t.id join employees as e on ts.employee_id = e.id group by ts.employee_id, e.name, t.id) select employee_id, name as employee_name, sum(total) as total_score from cte group by employee_id, employee_name order by total_score desc, employee_id asc ;"

    Christian B. - "with cte as (select ts.employee_id, e.name, t.id as test_id, max(distinct ts.score) as total from test_results as ts join tests as t on ts.test_id = t.id join employees as e on ts.employee_id = e.id group by ts.employee_id, e.name, t.id) select employee_id, name as employee_name, sum(total) as total_score from cte group by employee_id, employee_name order by total_score desc, employee_id asc ;"See full answer

    Data Engineer
    Coding
    +3 more
  • Adobe logoAsked at Adobe 
    54 answers
    +50

    "a. Sort the array elements. b. take two pointers at index 0 and index Len-1; c. if the sum at the two pointers is target; break and return the pair d. if the sum is smaller, then move left pointer by 1 e. else move right pointer by 1; run the logic till the target is met or right pointer crosses the left pointer."

    Komal S. - "a. Sort the array elements. b. take two pointers at index 0 and index Len-1; c. if the sum at the two pointers is target; break and return the pair d. if the sum is smaller, then move left pointer by 1 e. else move right pointer by 1; run the logic till the target is met or right pointer crosses the left pointer."See full answer

    Data Engineer
    Data Structures & Algorithms
    +5 more
  • Google logoAsked at Google 
    4 answers
    Video answer for 'Design a high-tech gym.'
    +1

    "[2:53 pm, 02/12/2021] Mayank: Before we deep dive into brainstorming the solution I go ahead and make few assumptions for clarifying questions: Step-1- Framing a problem 🎯Why users go to gym? To relieve stress by doing exercise To maintain their body To reduce their weight To remain active To make good physique 🎯Who goes to gym? Couples Group of friends Individuals Here we are trying to design high tech gym so it means we are looking to create good experience-and"

    Mayank S. - "[2:53 pm, 02/12/2021] Mayank: Before we deep dive into brainstorming the solution I go ahead and make few assumptions for clarifying questions: Step-1- Framing a problem 🎯Why users go to gym? To relieve stress by doing exercise To maintain their body To reduce their weight To remain active To make good physique 🎯Who goes to gym? Couples Group of friends Individuals Here we are trying to design high tech gym so it means we are looking to create good experience-and"See full answer

    Data Engineer
    Product Design
Showing 41-60 of 170
Exponent

Get updates in your inbox with the latest tips, job listings, and more.

Follow Us

Products
Courses
Interview Questions
Interview Experiences
Popular articles
Guides
Coaching
For Partners
Company
Exponent © 2026
Terms of Service | Privacy