Skip to main content

Recent Data Engineer Interview Questions

Review this list of 160 Data Engineer interview questions and answers verified by hiring managers and candidates.
  • 14 answers
    +10

    " select user_id, b.marketing_channel from user_sessions a Left join attribution b on b.sessionid = a.sessionid group by 1,2 HAVING sum(purchasevalue)>100 and min(adclick_timestamp) `"

    G B. - " select user_id, b.marketing_channel from user_sessions a Left join attribution b on b.sessionid = a.sessionid group by 1,2 HAVING sum(purchasevalue)>100 and min(adclick_timestamp) `"See full answer

    Data Engineer
    Coding
    +3 more
  • 12 answers
    +9

    "SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"

    Alina G. - "SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"See full answer

    Data Engineer
    Coding
    +3 more
  • 15 answers
    +12

    " with youngsuccrate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as yascrate from post where userid in (select userid from post_user where age between 0 and 18) group by post_month ), nonyoungsucc_rate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as nonyasc_rate from post where user_id in (select"

    Bhavna S. - " with youngsuccrate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as yascrate from post where userid in (select userid from post_user where age between 0 and 18) group by post_month ), nonyoungsucc_rate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as nonyasc_rate from post where user_id in (select"See full answer

    Data Engineer
    Coding
    +3 more
  • 75 answers
    Video answer for 'Employee Earnings.'
    +68

    "select e.firstname as firstname, m.salary as manager_salary from employees e join employees m on e.manager_id = m.id where e.salary > m.salary; `"

    Ravi K. - "select e.firstname as firstname, m.salary as manager_salary from employees e join employees m on e.manager_id = m.id where e.salary > m.salary; `"See full answer

    Data Engineer
    Coding
    +4 more
  • 14 answers
    +11

    "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"

    Peter W. - "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"See full answer

    Data Engineer
    Coding
    +3 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • LinkedIn logoAsked at LinkedIn 
    36 answers
    +31

    "WITH filtered_posts AS ( SELECT p.user_id, p.issuccessfulpost FROM post p WHERE p.postdate >= '2023-11-01' AND p.postdate < '2023-12-01' ), post_summary AS ( SELECT pu.user_type, COUNT(*) AS post_attempt, SUM(CASE WHEN fp.issuccessfulpost = 1 THEN 1 ELSE 0 END) AS post_success FROM filtered_posts fp JOIN postuser pu ON fp.userid = pu.user_id GROUP BY pu.user_type ) SELECT user_type, post_success, post_attempt, CAST(postsuccess AS FLOAT) / postattempt AS postsuccessrate FROM po"

    David I. - "WITH filtered_posts AS ( SELECT p.user_id, p.issuccessfulpost FROM post p WHERE p.postdate >= '2023-11-01' AND p.postdate < '2023-12-01' ), post_summary AS ( SELECT pu.user_type, COUNT(*) AS post_attempt, SUM(CASE WHEN fp.issuccessfulpost = 1 THEN 1 ELSE 0 END) AS post_success FROM filtered_posts fp JOIN postuser pu ON fp.userid = pu.user_id GROUP BY pu.user_type ) SELECT user_type, post_success, post_attempt, CAST(postsuccess AS FLOAT) / postattempt AS postsuccessrate FROM po"See full answer

    Data Engineer
    Coding
    +4 more
  • Adobe logoAsked at Adobe 
    40 answers
    Video answer for 'Edit distance'
    +32

    "from collections import deque def updateword(words, startword, end_word): if end_word not in words: return None # Early exit if end_word is not in the dictionary queue = deque([(start_word, 0)]) # (word, steps) visited = set([start_word]) # Keep track of visited words while queue: word, steps = queue.popleft() if word == end_word: return steps # Found the target word, return steps for i in range(len(word)): "

    叶 路. - "from collections import deque def updateword(words, startword, end_word): if end_word not in words: return None # Early exit if end_word is not in the dictionary queue = deque([(start_word, 0)]) # (word, steps) visited = set([start_word]) # Keep track of visited words while queue: word, steps = queue.popleft() if word == end_word: return steps # Found the target word, return steps for i in range(len(word)): "See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • Meta logoAsked at Meta 
    1 answer

    "int[] sqSorted(int[] nums) { int i = 0, j = nums.length-1; int k = nums.length-1; int[] sqs = new int[nums.length]; while(i n1) { sqs[k--] = n2; j--; } else { sqs[k--] = n1; i++; } } for(int n: sqs) System.out.println(n); return sqs; }"

    Mahaboob P. - "int[] sqSorted(int[] nums) { int i = 0, j = nums.length-1; int k = nums.length-1; int[] sqs = new int[nums.length]; while(i n1) { sqs[k--] = n2; j--; } else { sqs[k--] = n1; i++; } } for(int n: sqs) System.out.println(n); return sqs; }"See full answer

    Data Engineer
    Data Structures & Algorithms
    +2 more
  • +1

    "Multithreading: Multiple threads run within the same process, sharing memory. More lightweight, Faster Context switching shared memory - potentials synchronizartion issues Use Lock, Synchronized keywords to handle Multiprocessing: Multiple processes run independently, each with its own memory space. More heavyweight because of own resources, which reduces shared data corruption issues. Slower need to manage seperate processes Need to use IPC mechanisms like pipes, sockets an"

    Sue G. - "Multithreading: Multiple threads run within the same process, sharing memory. More lightweight, Faster Context switching shared memory - potentials synchronizartion issues Use Lock, Synchronized keywords to handle Multiprocessing: Multiple processes run independently, each with its own memory space. More heavyweight because of own resources, which reduces shared data corruption issues. Slower need to manage seperate processes Need to use IPC mechanisms like pipes, sockets an"See full answer

    Data Engineer
    Concept
  • "I have worked with tools such as Hadoop, Oracle, DBeaver, and Databricks. I have used these tools at work when dealing with large datasets, data cleaning, data security, and creating models"

    Aneesh D. - "I have worked with tools such as Hadoop, Oracle, DBeaver, and Databricks. I have used these tools at work when dealing with large datasets, data cleaning, data security, and creating models"See full answer

    Data Engineer
    Technical
  • Adobe logoAsked at Adobe 
    5 answers
    +2

    "def calc(expr): ans = eval(expr) return ans your code goes debug your code below print(calc("1 + 1")) `"

    Sarvesh G. - "def calc(expr): ans = eval(expr) return ans your code goes debug your code below print(calc("1 + 1")) `"See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • Adobe logoAsked at Adobe 
    12 answers
    +8

    " function findPrimes(n) { if (n 1; i--) { if (num % i === 0) { notPrimes.add(num); return false; } } return true; } for (let i = 2; i 5 && !notPr"

    Jeff S. - " function findPrimes(n) { if (n 1; i--) { if (num % i === 0) { notPrimes.add(num); return false; } } return true; } for (let i = 2; i 5 && !notPr"See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Adobe logoAsked at Adobe 
    31 answers
    +26

    "We can use dictionary to store cache items so that our read / write operations will be O(1). Each time we read or update an existing record, we have to ensure the item is moved to the back of the cache. This will allow us to evict the first item in the cache whenever the cache is full and we need to add new records also making our eviction O(1) Instead of normal dictionary, we will use ordered dictionary to store cache items. This will allow us to efficiently move items to back of the cache a"

    Alfred O. - "We can use dictionary to store cache items so that our read / write operations will be O(1). Each time we read or update an existing record, we have to ensure the item is moved to the back of the cache. This will allow us to evict the first item in the cache whenever the cache is full and we need to add new records also making our eviction O(1) Instead of normal dictionary, we will use ordered dictionary to store cache items. This will allow us to efficiently move items to back of the cache a"See full answer

    Data Engineer
    Data Structures & Algorithms
    +6 more
  • Apple logoAsked at Apple 
    3 answers

    "class Trie { private TrieNode root; public Trie() { root = new TrieNode(); } public void insert(String word) { TrieNode temp=root; for(int i=0; i<word.length(); i++) { if(!temp.children.containsKey(word.charAt(i))) { temp.children.put(word.charAt(i), new TrieNode()); } temp=temp.children.get(word.charAt(i)); } temp.isEndOfWord=true; } public boolean search(String word) { TrieNode temp=root; for(int i=0; i<word.length(); i++) { if(!temp.children.containsKey(word.charAt(i))) { return false; } temp"

    Divya R. - "class Trie { private TrieNode root; public Trie() { root = new TrieNode(); } public void insert(String word) { TrieNode temp=root; for(int i=0; i<word.length(); i++) { if(!temp.children.containsKey(word.charAt(i))) { temp.children.put(word.charAt(i), new TrieNode()); } temp=temp.children.get(word.charAt(i)); } temp.isEndOfWord=true; } public boolean search(String word) { TrieNode temp=root; for(int i=0; i<word.length(); i++) { if(!temp.children.containsKey(word.charAt(i))) { return false; } temp"See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Apple logoAsked at Apple 
    24 answers
    +21

    " def is_valid(s: str) -> bool: openBracket = set() openBracket.add('{') openBracket.add('(') openBracket.add('[') stack = [] for c in s: if stack and (c == ')' and stack[len(stack)-1] == '(')\ or\ (c == '}' and stack[len(stack)-1] == '{')\ or\ (c == ']' and stack[len(stack)-1] == '['): stack.pop() elif c in openBracket: stack.append(c) else: retu"

    Aikya S. - " def is_valid(s: str) -> bool: openBracket = set() openBracket.add('{') openBracket.add('(') openBracket.add('[') stack = [] for c in s: if stack and (c == ')' and stack[len(stack)-1] == '(')\ or\ (c == '}' and stack[len(stack)-1] == '{')\ or\ (c == ']' and stack[len(stack)-1] == '['): stack.pop() elif c in openBracket: stack.append(c) else: retu"See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
  • 1 answer

    "i said there is hashed, clustered, non-clustered"

    Erjan G. - "i said there is hashed, clustered, non-clustered"See full answer

    Data Engineer
    Technical
  • 2 answers

    "Parquet = reading only the columns you need in a spreadsheet Avro = reading full rows one at a time"

    Dessalew A. - "Parquet = reading only the columns you need in a spreadsheet Avro = reading full rows one at a time"See full answer

    Data Engineer
    Technical
  • "i failed to answer, did not know"

    Erjan G. - "i failed to answer, did not know"See full answer

    Data Engineer
    Technical
  • Walmart Labs logoAsked at Walmart Labs 
    1 answer

    "I’ve spent over 6 years building and scaling e-commerce products across EMEA and APAC. At Jumia, I led product initiatives on the checkout and payments side. For example, I launched gamified promotions on PDP and checkout that improved engagement and delivered a 2.3x uplift in conversion. I also introduced automated installment payments and order cancellation flows, which not only improved user trust but also reduced complaints by 30% and lowered operational costs. Before that, at Lazada, I work"

    Rajeev K. - "I’ve spent over 6 years building and scaling e-commerce products across EMEA and APAC. At Jumia, I led product initiatives on the checkout and payments side. For example, I launched gamified promotions on PDP and checkout that improved engagement and delivered a 2.3x uplift in conversion. I also introduced automated installment payments and order cancellation flows, which not only improved user trust but also reduced complaints by 30% and lowered operational costs. Before that, at Lazada, I work"See full answer

    Data Engineer
    Behavioral
    +2 more
  • Amazon logoAsked at Amazon 
    9 answers
    +6

    "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "

    Gabriele G. - "DFS with check of an already seen node in the graph would work from collections import deque, defaultdict from typing import List def iscourseloopdfs(idcourse: int, graph: defaultdict[list]) -> bool: stack = deque([(id_course)]) seen_courses = set() while stack: print(stack) curr_course = stack.pop() if currcourse in seencourses: return True seencourses.add(currcourse) for dependency in graph[curr_course]: "See full answer

    Data Engineer
    Data Structures & Algorithms
    +4 more
Showing 81-100 of 160