Recent Data Engineer Interview Questions

Review this list of 160 Data Engineer interview questions and answers verified by hiring managers and candidates.

+ Share interview

Marketing Channel Attribution
IDE
Medium
Data Engineer
Coding
+3 more
14 answers
+10
" select user_id, b.marketing_channel from user_sessions a Left join attribution b on b.sessionid = a.sessionid group by 1,2 HAVING sum(purchasevalue)>100 and min(adclick_timestamp) `"
G B. - " select user_id, b.marketing_channel from user_sessions a Left join attribution b on b.sessionid = a.sessionid group by 1,2 HAVING sum(purchasevalue)>100 and min(adclick_timestamp) `"See full answer
Data Engineer
Coding
+3 more
Find Campaign Purchases
IDE
Medium
Data Engineer
Coding
+3 more
12 answers
+9
"SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"
Alina G. - "SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"See full answer
Data Engineer
Coding
+3 more
Post Success By Age Group.
IDE
Medium
Data Engineer
Coding
+3 more
15 answers
+12
" with youngsuccrate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as yascrate from post where userid in (select userid from post_user where age between 0 and 18) group by post_month ), nonyoungsucc_rate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as nonyasc_rate from post where user_id in (select"
Bhavna S. - " with youngsuccrate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as yascrate from post where userid in (select userid from post_user where age between 0 and 18) group by post_month ), nonyoungsucc_rate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as nonyasc_rate from post where user_id in (select"See full answer
Data Engineer
Coding
+3 more
Employee Earnings.
IDE
Medium
Data Engineer
Coding
+4 more
75 answers
+68
"select e.firstname as firstname, m.salary as manager_salary from employees e join employees m on e.manager_id = m.id where e.salary > m.salary; `"
Ravi K. - "select e.firstname as firstname, m.salary as manager_salary from employees e join employees m on e.manager_id = m.id where e.salary > m.salary; `"See full answer
Data Engineer
Coding
+4 more
High Volume Low Success.
IDE
Easy
Data Engineer
Coding
+3 more
14 answers
+11
"In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"
Peter W. - "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"See full answer
Data Engineer
Coding
+3 more

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

Asked at LinkedIn • 12 days ago
Monthly Post Success Analysis.
IDE
Easy
Data Engineer
Coding
+4 more
36 answers
+31
"WITH filtered_posts AS ( SELECT p.user_id, p.issuccessfulpost FROM post p WHERE p.postdate >= '2023-11-01' AND p.postdate < '2023-12-01' ), post_summary AS ( SELECT pu.user_type, COUNT(*) AS post_attempt, SUM(CASE WHEN fp.issuccessfulpost = 1 THEN 1 ELSE 0 END) AS post_success FROM filtered_posts fp JOIN postuser pu ON fp.userid = pu.user_id GROUP BY pu.user_type ) SELECT user_type, post_success, post_attempt, CAST(postsuccess AS FLOAT) / postattempt AS postsuccessrate FROM po"
David I. - "WITH filtered_posts AS ( SELECT p.user_id, p.issuccessfulpost FROM post p WHERE p.postdate >= '2023-11-01' AND p.postdate < '2023-12-01' ), post_summary AS ( SELECT pu.user_type, COUNT(*) AS post_attempt, SUM(CASE WHEN fp.issuccessfulpost = 1 THEN 1 ELSE 0 END) AS post_success FROM filtered_posts fp JOIN postuser pu ON fp.userid = pu.user_id GROUP BY pu.user_type ) SELECT user_type, post_success, post_attempt, CAST(postsuccess AS FLOAT) / postattempt AS postsuccessrate FROM po"See full answer
Data Engineer
Coding
+4 more
Asked at Adobe, Amazon, Apple + 2 more • 3 months ago
Edit distance
IDE
Hard
Data Engineer
Data Structures & Algorithms
+3 more
40 answers
+32
"from collections import deque def updateword(words, startword, end_word): if end_word not in words: return None # Early exit if end_word is not in the dictionary queue = deque([(start_word, 0)]) # (word, steps) visited = set([start_word]) # Keep track of visited words while queue: word, steps = queue.popleft() if word == end_word: return steps # Found the target word, return steps for i in range(len(word)): "
叶路. - "from collections import deque def updateword(words, startword, end_word): if end_word not in words: return None # Early exit if end_word is not in the dictionary queue = deque([(start_word, 0)]) # (word, steps) visited = set([start_word]) # Keep track of visited words while queue: word, steps = queue.popleft() if word == end_word: return steps # Found the target word, return steps for i in range(len(word)): "See full answer
Data Engineer
Data Structures & Algorithms
+3 more
Asked at Meta, PayPal • a year ago
Squares of sorted array
Data Engineer
Data Structures & Algorithms
+2 more
1 answer
"int[] sqSorted(int[] nums) { int i = 0, j = nums.length-1; int k = nums.length-1; int[] sqs = new int[nums.length]; while(i n1) { sqs[k--] = n2; j--; } else { sqs[k--] = n1; i++; } } for(int n: sqs) System.out.println(n); return sqs; }"
Mahaboob P. - "int[] sqSorted(int[] nums) { int i = 0, j = nums.length-1; int k = nums.length-1; int[] sqs = new int[nums.length]; while(i n1) { sqs[k--] = n2; j--; } else { sqs[k--] = n1; i++; } } for(int n: sqs) System.out.println(n); return sqs; }"See full answer
Data Engineer
Data Structures & Algorithms
+2 more
Explain the differences between multithreading and multiprocessing.
Data Engineer
Concept
4 answers
+1
"Multithreading: Multiple threads run within the same process, sharing memory. More lightweight, Faster Context switching shared memory - potentials synchronizartion issues Use Lock, Synchronized keywords to handle Multiprocessing: Multiple processes run independently, each with its own memory space. More heavyweight because of own resources, which reduces shared data corruption issues. Slower need to manage seperate processes Need to use IPC mechanisms like pipes, sockets an"
Sue G. - "Multithreading: Multiple threads run within the same process, sharing memory. More lightweight, Faster Context switching shared memory - potentials synchronizartion issues Use Lock, Synchronized keywords to handle Multiprocessing: Multiple processes run independently, each with its own memory space. More heavyweight because of own resources, which reduces shared data corruption issues. Slower need to manage seperate processes Need to use IPC mechanisms like pipes, sockets an"See full answer
Data Engineer
Concept
What data tools have you worked with, and what specific projects did you use those tools for?
Data Engineer
Technical
1 answer
"I have worked with tools such as Hadoop, Oracle, DBeaver, and Databricks. I have used these tools at work when dealing with large datasets, data cleaning, data security, and creating models"
Aneesh D. - "I have worked with tools such as Hadoop, Oracle, DBeaver, and Databricks. I have used these tools at work when dealing with large datasets, data cleaning, data security, and creating models"See full answer
Data Engineer
Technical
Asked at Adobe, Apple, Meta + 2 more • a year ago
Build a Calculator
IDE
Medium
Data Engineer
Data Structures & Algorithms
+3 more
5 answers
+2
"def calc(expr): ans = eval(expr) return ans your code goes debug your code below print(calc("1 + 1")) `"
Sarvesh G. - "def calc(expr): ans = eval(expr) return ans your code goes debug your code below print(calc("1 + 1")) `"See full answer
Data Engineer
Data Structures & Algorithms
+3 more
Asked at Adobe, Apple, Salesforce + 1 more • a year ago
Write a function to return all prime numbers up to a given number n.
IDE
Medium
Data Engineer
Data Structures & Algorithms
+4 more
12 answers
+8
" function findPrimes(n) { if (n 1; i--) { if (num % i === 0) { notPrimes.add(num); return false; } } return true; } for (let i = 2; i 5 && !notPr"
Jeff S. - " function findPrimes(n) { if (n 1; i--) { if (num % i === 0) { notPrimes.add(num); return false; } } return true; } for (let i = 2; i 5 && !notPr"See full answer
Data Engineer
Data Structures & Algorithms
+4 more
Asked at Adobe, Anthropic, Apple + 18 more • 7 days ago
Implement LRU Cache.
IDE
Hard
Data Engineer
Data Structures & Algorithms
+6 more
31 answers
+26
"We can use dictionary to store cache items so that our read / write operations will be O(1). Each time we read or update an existing record, we have to ensure the item is moved to the back of the cache. This will allow us to evict the first item in the cache whenever the cache is full and we need to add new records also making our eviction O(1) Instead of normal dictionary, we will use ordered dictionary to store cache items. This will allow us to efficiently move items to back of the cache a"
Alfred O. - "We can use dictionary to store cache items so that our read / write operations will be O(1). Each time we read or update an existing record, we have to ensure the item is moved to the back of the cache. This will allow us to evict the first item in the cache whenever the cache is full and we need to add new records also making our eviction O(1) Instead of normal dictionary, we will use ordered dictionary to store cache items. This will allow us to efficiently move items to back of the cache a"See full answer
Data Engineer
Data Structures & Algorithms
+6 more
Asked at Apple, Goldman Sachs, Meta + 1 more • 11 days ago
Implement Trie
IDE
Medium
Data Engineer
Data Structures & Algorithms
+4 more
3 answers
"class Trie { private TrieNode root; public Trie() { root = new TrieNode(); } public void insert(String word) { TrieNode temp=root; for(int i=0; i<word.length(); i++) { if(!temp.children.containsKey(word.charAt(i))) { temp.children.put(word.charAt(i), new TrieNode()); } temp=temp.children.get(word.charAt(i)); } temp.isEndOfWord=true; } public boolean search(String word) { TrieNode temp=root; for(int i=0; i<word.length(); i++) { if(!temp.children.containsKey(word.charAt(i))) { return false; } temp"
Divya R. - "class Trie { private TrieNode root; public Trie() { root = new TrieNode(); } public void insert(String word) { TrieNode temp=root; for(int i=0; i<word.length(); i++) { if(!temp.children.containsKey(word.charAt(i))) { temp.children.put(word.charAt(i), new TrieNode()); } temp=temp.children.get(word.charAt(i)); } temp.isEndOfWord=true; } public boolean search(String word) { TrieNode temp=root; for(int i=0; i<word.length(); i++) { if(!temp.children.containsKey(word.charAt(i))) { return false; } temp"See full answer
Data Engineer
Data Structures & Algorithms
+4 more
Asked at Apple, Booking.com, Goldman Sachs + 10 more • 9 days ago
Valid Parentheses
IDE
Easy
Data Engineer
Data Structures & Algorithms
+4 more
24 answers
+21
" def is_valid(s: str) -> bool: openBracket = set() openBracket.add('{') openBracket.add('(') openBracket.add('[') stack = [] for c in s: if stack and (c == ')' and stack[len(stack)-1] == '(')\ or\ (c == '}' and stack[len(stack)-1] == '{')\ or\ (c == ']' and stack[len(stack)-1] == '['): stack.pop() elif c in openBracket: stack.append(c) else: retu"
Aikya S. - " def is_valid(s: str) -> bool: openBracket = set() openBracket.add('{') openBracket.add('(') openBracket.add('[') stack = [] for c in s: if stack and (c == ')' and stack[len(stack)-1] == '(')\ or\ (c == '}' and stack[len(stack)-1] == '{')\ or\ (c == ']' and stack[len(stack)-1] == '['): stack.pop() elif c in openBracket: stack.append(c) else: retu"See full answer
Data Engineer
Data Structures & Algorithms
+4 more
What types of indexes are in a relational database?
Data Engineer
Technical
1 answer
"i said there is hashed, clustered, non-clustered"
Erjan G. - "i said there is hashed, clustered, non-clustered"See full answer
Data Engineer
Technical
Explain the differences between Parquet and Avro.
Data Engineer
Technical
2 answers
"Parquet = reading only the columns you need in a spreadsheet Avro = reading full rows one at a time"
Dessalew A. - "Parquet = reading only the columns you need in a spreadsheet Avro = reading full rows one at a time"See full answer
Data Engineer
Technical
Explain the differences between wide and narrow dependencies in Apache Spark.
Data Engineer
Technical
1 answer
"i failed to answer, did not know"
Erjan G. - "i failed to answer, did not know"See full answer
Data Engineer
Technical
Asked at Walmart Labs • a year ago
Tell me about your e-commerce experience.
Data Engineer
Behavioral
+2 more
1 answer
"I’ve spent over 6 years building and scaling e-commerce products across EMEA and APAC. At Jumia, I led product initiatives on the checkout and payments side. For example, I launched gamified promotions on PDP and checkout that improved engagement and delivered a 2.3x uplift in conversion. I also introduced automated installment payments and order cancellation flows, which not only improved user trust but also reduced complaints by 30% and lowered operational costs. Before that, at Lazada, I work"
Rajeev K. - "I’ve spent over 6 years building and scaling e-commerce products across EMEA and APAC. At Jumia, I led product initiatives on the checkout and payments side. For example, I launched gamified promotions on PDP and checkout that improved engagement and delivered a 2.3x uplift in conversion. I also introduced automated installment payments and order cancellation flows, which not only improved user trust but also reduced complaints by 30% and lowered operational costs. Before that, at Lazada, I work"See full answer
Data Engineer
Behavioral
+2 more
Asked at Amazon, Apple, Oracle + 3 more • a year ago
Course Schedule
IDE
Medium
Data Engineer
Data Structures & Algorithms
+4 more
9 answers
+6
"DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "
Gabriele G. - "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "See full answer
Data Engineer
Data Structures & Algorithms
+4 more