Data Engineer Interview Questions

Review this list of 154 Data Engineer interview questions and answers verified by hiring managers and candidates.

+ Share interview

Share interview

Asked at Adobe, Bytedance, LinkedIn + 3 more • 9 months ago
Merge two sorted lists
Data Engineer
Data Structures & Algorithms
+4 more
6 answers I was asked this
+3
"public class sample { public int [] merge(int [] a, int [] b) { if(a == null || a.length == 0 || b == null || b.length == 0) return null; int i = 0, j = 0, index = -1; int [] merged = new int[a.length + b.length]; while (i < a.length && j < b.length) { if(a[i] < b[i]) merged[++index] = a[i++]; else merged[++index] = b[j++]; } while (i < a.length) { merged[++index] = a[i++]; } "
Nikhil R. - "public class sample { public int [] merge(int [] a, int [] b) { if(a == null || a.length == 0 || b == null || b.length == 0) return null; int i = 0, j = 0, index = -1; int [] merged = new int[a.length + b.length]; while (i < a.length && j < b.length) { if(a[i] < b[i]) merged[++index] = a[i++]; else merged[++index] = b[j++]; } while (i < a.length) { merged[++index] = a[i++]; } "See full answer
Data Engineer
Data Structures & Algorithms
+4 more
Asked at Adobe, Meta, Oracle + 1 more • a year ago
Determine if a given binary tree is a binary search tree (BST).
IDE
Medium
Data Engineer
Data Structures & Algorithms
+4 more
10 answers I was asked this
+6
"bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"
Alvaro R. - "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"See full answer
Data Engineer
Data Structures & Algorithms
+4 more
Design a Data Warehouse Schema for Airbnb
Data Engineer
Data Modeling
Add answer I was asked this
Data Engineer
Data Modeling
Asked at Apple, LinkedIn, Meta + 1 more • a year ago
Partition an array into two sub-arrays with equal sum.
IDE
Medium
Data Engineer
Data Structures & Algorithms
+2 more
6 answers I was asked this
+2
"This could be done using two-pointer approach assuming array is sorted: left and right pointers. We need track two sums (left and right) as we move pointers. For moving pointers we will move left to right by 1 (increment) when right sum is greater. We will move right pointer to left by 1 (decrement) when left sum is greater. at some point we will either get the sum same and that's when we exit from the loop. 0-left will be one array and right-(n-1) will be another array. We are not going to mo"
Bhaskar B. - "This could be done using two-pointer approach assuming array is sorted: left and right pointers. We need track two sums (left and right) as we move pointers. For moving pointers we will move left to right by 1 (increment) when right sum is greater. We will move right pointer to left by 1 (decrement) when left sum is greater. at some point we will either get the sum same and that's when we exit from the loop. 0-left will be one array and right-(n-1) will be another array. We are not going to mo"See full answer
Data Engineer
Data Structures & Algorithms
+2 more
Design an ETL Pipeline for Slack for School
Data Engineer
Data Pipeline Design
Add answer I was asked this
Data Engineer
Data Pipeline Design

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

Asked at Amazon, Anthropic, Discord + 1 more • 3 months ago
How do you encourage collaboration among cross-functional teams?
Data Engineer
Behavioral
+5 more
2 answers I was asked this
"1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"
Saurabh N. - "1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"See full answer
Data Engineer
Behavioral
+5 more
Asked at Google • a year ago
When is Hadoop better than PySpark?
Data Engineer
Data Pipeline Design
1 answer I was asked this
"Hadoop is better than PySpark when you are dealing with extremely large scale, batch oriented, non-iterative workloads where in-memory computing isn't feasible/ necessary, like log storage or ETL workflows that don't require high response times. It's also better in situations where the Hadoop ecosystem is already deeply embedded and where there is a need for resource conscious, fault tolerant computation without the overhead of Spark's memory constraints. In these such scenarios, Hadoop's disk-b"
Joshua R. - "Hadoop is better than PySpark when you are dealing with extremely large scale, batch oriented, non-iterative workloads where in-memory computing isn't feasible/ necessary, like log storage or ETL workflows that don't require high response times. It's also better in situations where the Hadoop ecosystem is already deeply embedded and where there is a need for resource conscious, fault tolerant computation without the overhead of Spark's memory constraints. In these such scenarios, Hadoop's disk-b"See full answer
Data Engineer
Data Pipeline Design
Asked at Apple • a year ago
Set Matrix Zeroes
Data Engineer
Data Structures & Algorithms
+2 more
3 answers I was asked this
"I was able to provide the optimal approach and coded it up"
Anonymous Wasp - "I was able to provide the optimal approach and coded it up"See full answer
Data Engineer
Data Structures & Algorithms
+2 more
Asked at Databricks • a year ago
How would you handle scheduling dependencies between two nightly Jobs to ensure the second Job does not fail if the first Job runs longer than expected?
Data Engineer
Data Pipeline Design
1 answer I was asked this
"There are 2 questions popping into my mind: Should the 2nd job have to kick off at 12:30AM? Are there others depending on the 2nd job? If both answers are no, we may simply postpone the second job to allow sufficient time for the first one to complete. If they are yeses, we could let the 2nd job retry to a certain amount of times. Make sure that even reaching the maximum of retries won't delay or fail the following jobs."
Anzhe M. - "There are 2 questions popping into my mind: Should the 2nd job have to kick off at 12:30AM? Are there others depending on the 2nd job? If both answers are no, we may simply postpone the second job to allow sufficient time for the first one to complete. If they are yeses, we could let the 2nd job retry to a certain amount of times. Make sure that even reaching the maximum of retries won't delay or fail the following jobs."See full answer
Data Engineer
Data Pipeline Design
Fraudulent Transactions
IDE
Medium
Data Engineer
Coding
+3 more
6 answers I was asked this
+3
"WITH suspicious_transactions AS ( SELECT c.first_name, c.last_name, t.receipt_number, COUNT(t.receiptnumber) OVER (PARTITION BY c.customerid) AS noofoffences FROM customers c JOIN transactions t ON c.customerid = t.customerid WHERE t.receipt_number LIKE '%999%' OR t.receipt_number LIKE '%1234%' OR t.receipt_number LIKE '%XYZ%' ) SELECT first_name, last_name, receipt_number, noofoffences FROM suspicious_transactions WHERE noofoffences >= 2;"
Jayveer S. - "WITH suspicious_transactions AS ( SELECT c.first_name, c.last_name, t.receipt_number, COUNT(t.receiptnumber) OVER (PARTITION BY c.customerid) AS noofoffences FROM customers c JOIN transactions t ON c.customerid = t.customerid WHERE t.receipt_number LIKE '%999%' OR t.receipt_number LIKE '%1234%' OR t.receipt_number LIKE '%XYZ%' ) SELECT first_name, last_name, receipt_number, noofoffences FROM suspicious_transactions WHERE noofoffences >= 2;"See full answer
Data Engineer
Coding
+3 more
Asked at Adobe, Amazon, Apple + 10 more • a year ago
Calculate the trapped rainwater between bars in a given array.
IDE
Hard
Data Engineer
Data Structures & Algorithms
+4 more
12 answers I was asked this
+9
"from typing import List def traprainwater(height: List[int]) -> int: if not height: return 0 l, r = 0, len(height) - 1 leftMax, rightMax = height[l], height[r] res = 0 while l < r: if leftMax < rightMax: l += 1 leftMax = max(leftMax, height[l]) res += leftMax - height[l] else: r -= 1 rightMax = max(rightMax, height[r]) "
Anonymous Roadrunner - "from typing import List def traprainwater(height: List[int]) -> int: if not height: return 0 l, r = 0, len(height) - 1 leftMax, rightMax = height[l], height[r] res = 0 while l < r: if leftMax < rightMax: l += 1 leftMax = max(leftMax, height[l]) res += leftMax - height[l] else: r -= 1 rightMax = max(rightMax, height[r]) "See full answer
Data Engineer
Data Structures & Algorithms
+4 more
Asked at Adobe, Apple, Intuit + 3 more • a year ago
Sudoku Solver
IDE
Hard
Data Engineer
Data Structures & Algorithms
+4 more
4 answers I was asked this
+1
"static boolean sudokuSolve(char board) { return sudokuSolve(board, 0, 0); } static boolean sudokuSolve(char board, int r, int c) { if(c>=board[0].length) { r=r+1; c=0; } if(r>=board.length) return true; if(boardr=='.') { for(int num=1; num<=9; num++) { boardr=(char)('0' + num); if(isValidPosition(board, r, c)) { if(sudokuSolve(board, r, c+1)) return true; } boardr='.'; } } else { return sudokuSolve(board, r, c+1); } return false; } static boolean isValidPosition(char b"
Divya R. - "static boolean sudokuSolve(char board) { return sudokuSolve(board, 0, 0); } static boolean sudokuSolve(char board, int r, int c) { if(c>=board[0].length) { r=r+1; c=0; } if(r>=board.length) return true; if(boardr=='.') { for(int num=1; num<=9; num++) { boardr=(char)('0' + num); if(isValidPosition(board, r, c)) { if(sudokuSolve(board, r, c+1)) return true; } boardr='.'; } } else { return sudokuSolve(board, r, c+1); } return false; } static boolean isValidPosition(char b"See full answer
Data Engineer
Data Structures & Algorithms
+4 more
Asked at Apple, Goldman Sachs, Oracle • a year ago
Implement a hashmap without using any libraries.
Data Engineer
Data Structures & Algorithms
+2 more
1 answer I was asked this
"public class HashMap { public class Element { T key; V value; Element(T k, V v) { this.key = k; this.value = v; } } private static final int DEFAULT_CAPACITY = 16; private static final float LOAD_FACTOR = 0.75f; private LinkedList[] table = new LinkedList[DEFAULT_CAPACITY]; private int size = 0; private int threshold = (int) (DEFAULTCAPACITY * LOADFACTOR); public void put(T k"
Md kamrul H. - "public class HashMap { public class Element { T key; V value; Element(T k, V v) { this.key = k; this.value = v; } } private static final int DEFAULT_CAPACITY = 16; private static final float LOAD_FACTOR = 0.75f; private LinkedList[] table = new LinkedList[DEFAULT_CAPACITY]; private int size = 0; private int threshold = (int) (DEFAULTCAPACITY * LOADFACTOR); public void put(T k"See full answer
Data Engineer
Data Structures & Algorithms
+2 more
Asked at Microsoft • 10 months ago
What is SQL?
Data Engineer
SQL
+2 more
2 answers I was asked this
"SQL is structured query language."
Rafia M. - "SQL is structured query language."See full answer
Data Engineer
SQL
+2 more
Asked at Discord • a year ago
What other companies are you interviewing at and why?
Data Engineer
Behavioral
+4 more
Add answer I was asked this
Data Engineer
Behavioral
+4 more
Asked at Apple, Meta, Oracle • a year ago
Implement Trie
IDE
Medium
Data Engineer
Data Structures & Algorithms
+3 more
3 answers I was asked this
"class TrieNode { constructor() { this.children = {}; this.isEndOfWord = false; } } class Trie { constructor() { this.root = new TrieNode(); } insert(word) { let node = this.root; for (const char of word) { if (!node.children[char]) { node.children[char] = new TrieNode(); } node = node.children[char]; } node.isEndOfWord = true; } search(word) { l"
Tiago R. - "class TrieNode { constructor() { this.children = {}; this.isEndOfWord = false; } } class Trie { constructor() { this.root = new TrieNode(); } insert(word) { let node = this.root; for (const char of word) { if (!node.children[char]) { node.children[char] = new TrieNode(); } node = node.children[char]; } node.isEndOfWord = true; } search(word) { l"See full answer
Data Engineer
Data Structures & Algorithms
+3 more
Asked at Adobe, Apple • a year ago
Solve John Conway's "Game of Life".
Data Engineer
Data Structures & Algorithms
+2 more
Add answer I was asked this
Data Engineer
Data Structures & Algorithms
+2 more
Asked at Databricks • a year ago
What is a Medallion Architecture?
Data Engineer
Data Pipeline Design
2 answers I was asked this
"Medallion architecture is a layered data architecture used in lakehouse systems. Data flows through Bronze, Silver, and Gold layers where each layer improves data quality. Bronze stores raw data, Silver contains cleaned and validated datasets, and Gold provides aggregated business-ready data for analytics and reporting bronzedf = spark.read.json("/landing/apidata") bronze_df.write.format("delta").save("/bronze/users")"
Ramagiri P. - "Medallion architecture is a layered data architecture used in lakehouse systems. Data flows through Bronze, Silver, and Gold layers where each layer improves data quality. Bronze stores raw data, Silver contains cleaned and validated datasets, and Gold provides aggregated business-ready data for analytics and reporting bronzedf = spark.read.json("/landing/apidata") bronze_df.write.format("delta").save("/bronze/users")"See full answer
Data Engineer
Data Pipeline Design
Asked at Walmart Labs • a year ago
Why do you want to work at Walmart Labs?
Data Engineer
Behavioral
+5 more
Add answer I was asked this
Data Engineer
Behavioral
+5 more
Asked at Uber • a year ago
Design a rewarding system.
Data Engineer
Coding
1 answer I was asked this
"Not my answer, but rather the details of this question. It should include the following functions: int insertNewCustomer(double revenue) -> returns a customer ID (assume auto-incremented & 0-based) int insertNewCustomer(double revenue, int referrerID) -> returns a customer ID (assume auto-incremented & 0-based) Set getLowestKCustomersByMinTotalRevenue(int k, double minTotalRevenue) -> returns customer IDs Note: The total revenue consists of the revenue that this customer bring"
Anzhe M. - "Not my answer, but rather the details of this question. It should include the following functions: int insertNewCustomer(double revenue) -> returns a customer ID (assume auto-incremented & 0-based) int insertNewCustomer(double revenue, int referrerID) -> returns a customer ID (assume auto-incremented & 0-based) Set getLowestKCustomersByMinTotalRevenue(int k, double minTotalRevenue) -> returns customer IDs Note: The total revenue consists of the revenue that this customer bring"See full answer
Data Engineer
Coding