Skip to main content

Top Data Engineer Interview Questions

Review this list of 167 Data Engineer interview questions and answers verified by hiring managers and candidates.
  • Adobe logoAsked at Adobe 
    2 answers

    "func isMatch(text: String, pattern: String) -> Bool { // Convert strings to arrays for easier indexing let s = Array(text.characters) let p = Array(pattern.characters) guard !s.isEmpty && !p.isEmpty else { return true } // Create DP table: dpi represents if s[0...i-1] matches p[0...j-1] var dp = Array(repeating: Array(repeating: false, count: p.count + 1), count: s.count + 1) // Empty pattern matches empty string dp[0]["

    Reno S. - "func isMatch(text: String, pattern: String) -> Bool { // Convert strings to arrays for easier indexing let s = Array(text.characters) let p = Array(pattern.characters) guard !s.isEmpty && !p.isEmpty else { return true } // Create DP table: dpi represents if s[0...i-1] matches p[0...j-1] var dp = Array(repeating: Array(repeating: false, count: p.count + 1), count: s.count + 1) // Empty pattern matches empty string dp[0]["See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • "What do all data scientists need to know about how to work with very large datasets? 37 Follow Request Answer More All related (39) Recommended 📷 Corrin Lakeland · Follow , M.S. Data Science, University of St. Thomas, St. Paul (2018)6yData Science consultant and managerUpvoted by[Tom Halloin](https://www.quora"

    Hayatu H. - "What do all data scientists need to know about how to work with very large datasets? 37 Follow Request Answer More All related (39) Recommended 📷 Corrin Lakeland · Follow , M.S. Data Science, University of St. Thomas, St. Paul (2018)6yData Science consultant and managerUpvoted by[Tom Halloin](https://www.quora"See full answer

    Data Engineer
    Data Modeling
  • Adobe logoAsked at Adobe 
    Add answer
    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Adobe logoAsked at Adobe 
    1 answer

    "Leetcode 347: Heap + Hashtable Follow up question: create heap with the length of K instead of N (more time complexity but less space )"

    Chen J. - "Leetcode 347: Heap + Hashtable Follow up question: create heap with the length of K instead of N (more time complexity but less space )"See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • Apple logoAsked at Apple 
    3 answers

    "Recursion: 0 if NULL, else 1+max(height(left), height(right))"

    Mohith J. - "Recursion: 0 if NULL, else 1+max(height(left), height(right))"See full answer

    Data Engineer
    Data Structures & Algorithms
    +3 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Adobe logoAsked at Adobe 
    Add answer
    Data Engineer
    Data Structures & Algorithms
    +4 more
  • Deloitte logoAsked at Deloitte 
    3 answers

    "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"

    Meenakshi D. - "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"See full answer

    Data Engineer
    Concept
    +4 more
  • Apple logoAsked at Apple 
    3 answers

    "I was able to provide the optimal approach and coded it up"

    Anonymous Wasp - "I was able to provide the optimal approach and coded it up"See full answer

    Data Engineer
    Data Structures & Algorithms
    +2 more
  • Walmart Labs logoAsked at Walmart Labs 
    1 answer

    "I’ve spent over 6 years building and scaling e-commerce products across EMEA and APAC. At Jumia, I led product initiatives on the checkout and payments side. For example, I launched gamified promotions on PDP and checkout that improved engagement and delivered a 2.3x uplift in conversion. I also introduced automated installment payments and order cancellation flows, which not only improved user trust but also reduced complaints by 30% and lowered operational costs. Before that, at Lazada, I work"

    Rajeev K. - "I’ve spent over 6 years building and scaling e-commerce products across EMEA and APAC. At Jumia, I led product initiatives on the checkout and payments side. For example, I launched gamified promotions on PDP and checkout that improved engagement and delivered a 2.3x uplift in conversion. I also introduced automated installment payments and order cancellation flows, which not only improved user trust but also reduced complaints by 30% and lowered operational costs. Before that, at Lazada, I work"See full answer

    Data Engineer
    Behavioral
    +2 more
  • Google logoAsked at Google 
    1 answer

    "Hadoop is better than PySpark when you are dealing with extremely large scale, batch oriented, non-iterative workloads where in-memory computing isn't feasible/ necessary, like log storage or ETL workflows that don't require high response times. It's also better in situations where the Hadoop ecosystem is already deeply embedded and where there is a need for resource conscious, fault tolerant computation without the overhead of Spark's memory constraints. In these such scenarios, Hadoop's disk-b"

    Joshua R. - "Hadoop is better than PySpark when you are dealing with extremely large scale, batch oriented, non-iterative workloads where in-memory computing isn't feasible/ necessary, like log storage or ETL workflows that don't require high response times. It's also better in situations where the Hadoop ecosystem is already deeply embedded and where there is a need for resource conscious, fault tolerant computation without the overhead of Spark's memory constraints. In these such scenarios, Hadoop's disk-b"See full answer

    Data Engineer
    Data Pipeline Design
  • +1

    "a process can include many threads. good for concurrent and parallel task execution"

    Erjan G. - "a process can include many threads. good for concurrent and parallel task execution"See full answer

    Data Engineer
    Concept
  • "There are 2 questions popping into my mind: Should the 2nd job have to kick off at 12:30AM? Are there others depending on the 2nd job? If both answers are no, we may simply postpone the second job to allow sufficient time for the first one to complete. If they are yeses, we could let the 2nd job retry to a certain amount of times. Make sure that even reaching the maximum of retries won't delay or fail the following jobs."

    Anzhe M. - "There are 2 questions popping into my mind: Should the 2nd job have to kick off at 12:30AM? Are there others depending on the 2nd job? If both answers are no, we may simply postpone the second job to allow sufficient time for the first one to complete. If they are yeses, we could let the 2nd job retry to a certain amount of times. Make sure that even reaching the maximum of retries won't delay or fail the following jobs."See full answer

    Data Engineer
    Data Pipeline Design
  • Discord logoAsked at Discord 
    Add answer
    Data Engineer
    Behavioral
    +4 more
  • Apple logoAsked at Apple 
    1 answer

    "public class HashMap { public class Element { T key; V value; Element(T k, V v) { this.key = k; this.value = v; } } private static final int DEFAULT_CAPACITY = 16; private static final float LOAD_FACTOR = 0.75f; private LinkedList[] table = new LinkedList[DEFAULT_CAPACITY]; private int size = 0; private int threshold = (int) (DEFAULTCAPACITY * LOADFACTOR); public void put(T k"

    Md kamrul H. - "public class HashMap { public class Element { T key; V value; Element(T k, V v) { this.key = k; this.value = v; } } private static final int DEFAULT_CAPACITY = 16; private static final float LOAD_FACTOR = 0.75f; private LinkedList[] table = new LinkedList[DEFAULT_CAPACITY]; private int size = 0; private int threshold = (int) (DEFAULTCAPACITY * LOADFACTOR); public void put(T k"See full answer

    Data Engineer
    Data Structures & Algorithms
    +2 more
  • Databricks logoAsked at Databricks 
    2 answers

    "Medallion architecture is a layered data architecture used in lakehouse systems. Data flows through Bronze, Silver, and Gold layers where each layer improves data quality. Bronze stores raw data, Silver contains cleaned and validated datasets, and Gold provides aggregated business-ready data for analytics and reporting bronzedf = spark.read.json("/landing/apidata") bronze_df.write.format("delta").save("/bronze/users")"

    Ramagiri P. - "Medallion architecture is a layered data architecture used in lakehouse systems. Data flows through Bronze, Silver, and Gold layers where each layer improves data quality. Bronze stores raw data, Silver contains cleaned and validated datasets, and Gold provides aggregated business-ready data for analytics and reporting bronzedf = spark.read.json("/landing/apidata") bronze_df.write.format("delta").save("/bronze/users")"See full answer

    Data Engineer
    Data Pipeline Design
  • Walmart Labs logoAsked at Walmart Labs 
    Add answer
    Data Engineer
    Behavioral
    +5 more
  • 1 answer

    "i said there is hashed, clustered, non-clustered"

    Erjan G. - "i said there is hashed, clustered, non-clustered"See full answer

    Data Engineer
    Technical
  • 2 answers

    "Parquet = reading only the columns you need in a spreadsheet Avro = reading full rows one at a time"

    Dessalew A. - "Parquet = reading only the columns you need in a spreadsheet Avro = reading full rows one at a time"See full answer

    Data Engineer
    Technical
  • Anthropic logoAsked at Anthropic 
    Add answer
    Data Engineer
    Behavioral
    +1 more
  • Uber logoAsked at Uber 
    1 answer

    "Not my answer, but rather the details of this question. It should include the following functions: int insertNewCustomer(double revenue) -> returns a customer ID (assume auto-incremented & 0-based) int insertNewCustomer(double revenue, int referrerID) -> returns a customer ID (assume auto-incremented & 0-based) Set getLowestKCustomersByMinTotalRevenue(int k, double minTotalRevenue) -> returns customer IDs Note: The total revenue consists of the revenue that this customer bring"

    Anzhe M. - "Not my answer, but rather the details of this question. It should include the following functions: int insertNewCustomer(double revenue) -> returns a customer ID (assume auto-incremented & 0-based) int insertNewCustomer(double revenue, int referrerID) -> returns a customer ID (assume auto-incremented & 0-based) Set getLowestKCustomersByMinTotalRevenue(int k, double minTotalRevenue) -> returns customer IDs Note: The total revenue consists of the revenue that this customer bring"See full answer

    Data Engineer
    Coding
Showing 121-140 of 167