Skip to main content

Data Engineer Interview Questions

Review this list of 170 Data Engineer interview questions and answers verified by hiring managers and candidates.
  • Deloitte logoAsked at Deloitte 
    3 answers

    "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"

    Meenakshi D. - "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"See full answer

    Data Engineer
    Concept
    +4 more
  • Adobe logoAsked at Adobe 
    Add answer
    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Meta logoAsked at Meta 
    1 answer

    "Merge Sort"

    Ankita G. - "Merge Sort"See full answer

    Data Engineer
    Data Structures & Algorithms
    +1 more
  • 7 answers
    +4

    "-- LTV = Sum of all purchases made by that user -- order the results by desc on LTV select u.user_id, sum(a.purchase_value) as LTV from user_sessions u join attribution a on u.sessionid = a.sessionid group by u.user_id order by sum(a.purchase_value) desc"

    Mohit C. - "-- LTV = Sum of all purchases made by that user -- order the results by desc on LTV select u.user_id, sum(a.purchase_value) as LTV from user_sessions u join attribution a on u.sessionid = a.sessionid group by u.user_id order by sum(a.purchase_value) desc"See full answer

    Data Engineer
    Coding
    +3 more
  • Airbnb logoAsked at Airbnb 
    2 answers

    "Clarification questions What is the purpose of connecting the DB? Do we expect high-volumes of traffic to hit the DB Do we have scalability or reliability concerns? Format Code -> DB Code -> Cache -> DB API -> Cache -> DB - APIs are built for a purpose and have a specified protocol (GET, POST, DELETE) to speak to the DB. APIs can also use a contract to retrieve information from a DB much faster than code. Load balanced APIs -> Cache -> DB **Aut"

    Aaron W. - "Clarification questions What is the purpose of connecting the DB? Do we expect high-volumes of traffic to hit the DB Do we have scalability or reliability concerns? Format Code -> DB Code -> Cache -> DB API -> Cache -> DB - APIs are built for a purpose and have a specified protocol (GET, POST, DELETE) to speak to the DB. APIs can also use a contract to retrieve information from a DB much faster than code. Load balanced APIs -> Cache -> DB **Aut"See full answer

    Data Engineer
    Concept
    +6 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Adobe logoAsked at Adobe 
    6 answers
    +3

    "def mergeTwoListsRecursive(l1, l2): if not l1 or not l2: return l1 or l2 if l1.val < l2.val: l1.next = mergeTwoListsRecursive(l1.next, l2) return l1 else: l2.next = mergeTwoListsRecursive(l1, l2.next) return l2 "

    Ramachandra N. - "def mergeTwoListsRecursive(l1, l2): if not l1 or not l2: return l1 or l2 if l1.val < l2.val: l1.next = mergeTwoListsRecursive(l1.next, l2) return l1 else: l2.next = mergeTwoListsRecursive(l1, l2.next) return l2 "See full answer

    Data Engineer
    Data Structures & Algorithms
    +5 more
  • Adobe logoAsked at Adobe 
    10 answers
    +6

    "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"

    Alvaro R. - "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Apple logoAsked at Apple 
    7 answers
    +3

    "This could be done using two-pointer approach assuming array is sorted: left and right pointers. We need track two sums (left and right) as we move pointers. For moving pointers we will move left to right by 1 (increment) when right sum is greater. We will move right pointer to left by 1 (decrement) when left sum is greater. at some point we will either get the sum same and that's when we exit from the loop. 0-left will be one array and right-(n-1) will be another array. We are not going to mo"

    Bhaskar B. - "This could be done using two-pointer approach assuming array is sorted: left and right pointers. We need track two sums (left and right) as we move pointers. For moving pointers we will move left to right by 1 (increment) when right sum is greater. We will move right pointer to left by 1 (decrement) when left sum is greater. at some point we will either get the sum same and that's when we exit from the loop. 0-left will be one array and right-(n-1) will be another array. We are not going to mo"See full answer

    Data Engineer
    Data Structures & Algorithms
    +2 more
  • Add answer
    Video answer for 'Design an ETL Pipeline for Slack for School'
    Data Engineer
    Data Pipeline Design
  • Meta logoAsked at Meta 
    1 answer

    "int[] sqSorted(int[] nums) { int i = 0, j = nums.length-1; int k = nums.length-1; int[] sqs = new int[nums.length]; while(i n1) { sqs[k--] = n2; j--; } else { sqs[k--] = n1; i++; } } for(int n: sqs) System.out.println(n); return sqs; }"

    Mahaboob P. - "int[] sqSorted(int[] nums) { int i = 0, j = nums.length-1; int k = nums.length-1; int[] sqs = new int[nums.length]; while(i n1) { sqs[k--] = n2; j--; } else { sqs[k--] = n1; i++; } } for(int n: sqs) System.out.println(n); return sqs; }"See full answer

    Data Engineer
    Data Structures & Algorithms
    +2 more
  • 1 answer
    Video answer for 'Design a Data Warehouse Schema for Customer Support'

    "not able to understand the accent of the candidate"

    Akash A. - "not able to understand the accent of the candidate"See full answer

    Data Engineer
    Data Modeling
  • Atlassian logoAsked at Atlassian 
    Add answer
    Data Engineer
    Behavioral
    +7 more
  • Amazon logoAsked at Amazon 
    2 answers

    "1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"

    Saurabh N. - "1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"See full answer

    Data Engineer
    Behavioral
    +5 more
  • Tesla logoAsked at Tesla 
    Add answer
    Video answer for 'Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in clockwise spiral order.'
    Data Engineer
    Data Structures & Algorithms
    +3 more
  • Add answer
    Video answer for 'Design a Data Warehouse Schema for Airbnb'
    Data Engineer
    Data Modeling
  • Adobe logoAsked at Adobe 
    13 answers
    +10

    "from typing import List def traprainwater(height: List[int]) -> int: if not height: return 0 l, r = 0, len(height) - 1 leftMax, rightMax = height[l], height[r] res = 0 while l < r: if leftMax < rightMax: l += 1 leftMax = max(leftMax, height[l]) res += leftMax - height[l] else: r -= 1 rightMax = max(rightMax, height[r]) "

    Anonymous Roadrunner - "from typing import List def traprainwater(height: List[int]) -> int: if not height: return 0 l, r = 0, len(height) - 1 leftMax, rightMax = height[l], height[r] res = 0 while l < r: if leftMax < rightMax: l += 1 leftMax = max(leftMax, height[l]) res += leftMax - height[l] else: r -= 1 rightMax = max(rightMax, height[r]) "See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Adobe logoAsked at Adobe 
    4 answers
    +1

    "static boolean sudokuSolve(char board) { return sudokuSolve(board, 0, 0); } static boolean sudokuSolve(char board, int r, int c) { if(c>=board[0].length) { r=r+1; c=0; } if(r>=board.length) return true; if(boardr=='.') { for(int num=1; num<=9; num++) { boardr=(char)('0' + num); if(isValidPosition(board, r, c)) { if(sudokuSolve(board, r, c+1)) return true; } boardr='.'; } } else { return sudokuSolve(board, r, c+1); } return false; } static boolean isValidPosition(char b"

    Divya R. - "static boolean sudokuSolve(char board) { return sudokuSolve(board, 0, 0); } static boolean sudokuSolve(char board, int r, int c) { if(c>=board[0].length) { r=r+1; c=0; } if(r>=board.length) return true; if(boardr=='.') { for(int num=1; num<=9; num++) { boardr=(char)('0' + num); if(isValidPosition(board, r, c)) { if(sudokuSolve(board, r, c+1)) return true; } boardr='.'; } } else { return sudokuSolve(board, r, c+1); } return false; } static boolean isValidPosition(char b"See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Google logoAsked at Google 
    1 answer

    "Hadoop is better than PySpark when you are dealing with extremely large scale, batch oriented, non-iterative workloads where in-memory computing isn't feasible/ necessary, like log storage or ETL workflows that don't require high response times. It's also better in situations where the Hadoop ecosystem is already deeply embedded and where there is a need for resource conscious, fault tolerant computation without the overhead of Spark's memory constraints. In these such scenarios, Hadoop's disk-b"

    Joshua R. - "Hadoop is better than PySpark when you are dealing with extremely large scale, batch oriented, non-iterative workloads where in-memory computing isn't feasible/ necessary, like log storage or ETL workflows that don't require high response times. It's also better in situations where the Hadoop ecosystem is already deeply embedded and where there is a need for resource conscious, fault tolerant computation without the overhead of Spark's memory constraints. In these such scenarios, Hadoop's disk-b"See full answer

    Data Engineer
    Data Pipeline Design
  • "There are 2 questions popping into my mind: Should the 2nd job have to kick off at 12:30AM? Are there others depending on the 2nd job? If both answers are no, we may simply postpone the second job to allow sufficient time for the first one to complete. If they are yeses, we could let the 2nd job retry to a certain amount of times. Make sure that even reaching the maximum of retries won't delay or fail the following jobs."

    Anzhe M. - "There are 2 questions popping into my mind: Should the 2nd job have to kick off at 12:30AM? Are there others depending on the 2nd job? If both answers are no, we may simply postpone the second job to allow sufficient time for the first one to complete. If they are yeses, we could let the 2nd job retry to a certain amount of times. Make sure that even reaching the maximum of retries won't delay or fail the following jobs."See full answer

    Data Engineer
    Data Pipeline Design
  • 6 answers
    +3

    "WITH cte AS ( SELECT customer_id, COUNT(*) AS noofoffences FROM transactions WHERE receipt_number like '%999%' OR receipt_number like '%1234%' OR receipt_number like '%XYZ%' GROUP BY 1 HAVING COUNT(*) >=2 ) SELECT firstname, lastname, receipt_number, noofoffences FROM cte INNER JOIN customers using(customer_id) INNER JOIN transactions using(customer_id) ORDER BY 1,2,3,4 `"

    Michelle M. - "WITH cte AS ( SELECT customer_id, COUNT(*) AS noofoffences FROM transactions WHERE receipt_number like '%999%' OR receipt_number like '%1234%' OR receipt_number like '%XYZ%' GROUP BY 1 HAVING COUNT(*) >=2 ) SELECT firstname, lastname, receipt_number, noofoffences FROM cte INNER JOIN customers using(customer_id) INNER JOIN transactions using(customer_id) ORDER BY 1,2,3,4 `"See full answer

    Data Engineer
    Coding
    +3 more
Showing 101-120 of 170
Exponent

Get updates in your inbox with the latest tips, job listings, and more.

Follow Us

Products
Courses
Interview Questions
Interview Experiences
Popular articles
Guides
Coaching
For Partners
Company
Exponent © 2026
Terms of Service | Privacy