Data Engineer Interview Questions

Review this list of 14 data engineer interview questions and answers verified by hiring managers and candidates.

+ Add interview

Product

Engineering

Operations

Design

Marketing

Data

Sales

Finance

Consulting

Add interview

Software Engineer Product Manager Data Scientist Engineering Manager Machine Learning Engineer Technical Program Manager Data Engineer Data Analyst Backend Engineer Solutions Architect

Asked at Amazon • 3 months ago
Write a query to find the top 3 unique salaries in each department and list all employees who have those salaries.
Data Engineer
Coding
+2 more
4 answers I was asked this
+1
"select employeename, employeeid, salary, department, DR from ( select employeename, employeeid, salary, dense_rank() over (partition by department order by salary desc) DR, department from employee ) where DR <=3 order by department, DR"
Sreeram reddy B. - "select employeename, employeeid, salary, department, DR from ( select employeename, employeeid, salary, dense_rank() over (partition by department order by salary desc) DR, department from employee ) where DR <=3 order by department, DR"See full answer
Data Engineer
Coding
+2 more
Asked at Databricks, DoorDash • 4 months ago
Design a database schema for a fitness app.
Data Engineer
Data Modeling
3 answers I was asked this
"user table - with userid, username, email, phonenumber, accountcreateddate exercises table - types of exercises - indoor walk, outdoor walk, running, stairs, cycling, swimming etc - exerciseid, exercisetype date table - date, day, month, year - with dateid Session table - userid, sessiondateid(linked to dateid in date table), exerciseid, distance covered, calories spent, starttime, endtime "
Sreeram reddy B. - "user table - with userid, username, email, phonenumber, accountcreateddate exercises table - types of exercises - indoor walk, outdoor walk, running, stairs, cycling, swimming etc - exerciseid, exercisetype date table - date, day, month, year - with dateid Session table - userid, sessiondateid(linked to dateid in date table), exerciseid, distance covered, calories spent, starttime, endtime "See full answer
Data Engineer
Data Modeling
Asked at Amazon • 3 months ago
Write a query to find all dates where the stadium had three or more consecutive days with attendance of 100 or more people.
Data Engineer
Coding
+2 more
2 answers I was asked this
"How do you find consecutive days for login (MySQL, SQL, date, subquery, MySQL 5.7, development)? 1 Follow Request Answer More All related (34) Recommended 📷 Trausti Thor Johannsson · Follow Been using MySQL for more than 16 yearsDec 27 There are functions like DATEDIFF but there are also BETWE"
Hayatu H. - "How do you find consecutive days for login (MySQL, SQL, date, subquery, MySQL 5.7, development)? 1 Follow Request Answer More All related (34) Recommended 📷 Trausti Thor Johannsson · Follow Been using MySQL for more than 16 yearsDec 27 There are functions like DATEDIFF but there are also BETWE"See full answer
Data Engineer
Coding
+2 more
Asked at Meta (Facebook) • 3 months ago
Given a bookstore database schema, write SQL queries using joins and aggregations to answer questions about sales, inventory, and customer data.
Data Engineer
Coding
+2 more
1 answer I was asked this
"SELECT s.Sale_Date, SUM(si.Quantity * si.SalePrice) AS TotalRevenue FROM Sales s JOIN SaleItems si ON s.SaleID = si.Sale_ID GROUP BY s.Sale_Date ORDER BY s.Sale_Date; "
Bala G. - "SELECT s.Sale_Date, SUM(si.Quantity * si.SalePrice) AS TotalRevenue FROM Sales s JOIN SaleItems si ON s.SaleID = si.Sale_ID GROUP BY s.Sale_Date ORDER BY s.Sale_Date; "See full answer
Data Engineer
Coding
+2 more
Asked at Amazon • 3 months ago
Create geographic and demographic dashboards for weekly, monthly, and yearly analytics using order data (100M daily records for 5 years) and customer data (1B customers).
Data Engineer
Data Modeling
1 answer I was asked this
"What do all data scientists need to know about how to work with very large datasets? 37 Follow Request Answer More All related (39) Recommended 📷 Corrin Lakeland · Follow , M.S. Data Science, University of St. Thomas, St. Paul (2018)6yData Science consultant and managerUpvoted by[Tom Halloin](https://www.quora"
Hayatu H. - "What do all data scientists need to know about how to work with very large datasets? 37 Follow Request Answer More All related (39) Recommended 📷 Corrin Lakeland · Follow , M.S. Data Science, University of St. Thomas, St. Paul (2018)6yData Science consultant and managerUpvoted by[Tom Halloin](https://www.quora"See full answer
Data Engineer
Data Modeling

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

Asked at Databricks • 7 months ago
What's the difference between a data lakehouse and a data warehouse?
Data Engineer
Data Pipeline Design
3 answers I was asked this
"Data lake and warehouse are both places that allow an organization to store large amounts of data. When swimming in a lake, one would imagine that they come across all sorts of stuff - floating twigs, fish in the water, stones, chemicals and sometimes may be even a snake. Similarly, a data lake stores all forms of data that the company has without any indexing. The data is available at any time but needs to be first cleaned up and reorganized before it can be used for any type of analysis. A"
Kshitij I. - "Data lake and warehouse are both places that allow an organization to store large amounts of data. When swimming in a lake, one would imagine that they come across all sorts of stuff - floating twigs, fish in the water, stones, chemicals and sometimes may be even a snake. Similarly, a data lake stores all forms of data that the company has without any indexing. The data is available at any time but needs to be first cleaned up and reorganized before it can be used for any type of analysis. A"See full answer
Data Engineer
Data Pipeline Design
Asked at Databricks • 7 months ago
How would you handle slow query performance for a single-user SQL endpoint in Databricks, where all sequentially run queries are affected?
Data Engineer
Data Pipeline Design
Add answer I was asked this
Data Engineer
Data Pipeline Design
Asked at Google • 7 months ago
When is Hadoop better than PySpark?
Data Engineer
Data Pipeline Design
1 answer I was asked this
"Hadoop is better than PySpark when you are dealing with extremely large scale, batch oriented, non-iterative workloads where in-memory computing isn't feasible/ necessary, like log storage or ETL workflows that don't require high response times. It's also better in situations where the Hadoop ecosystem is already deeply embedded and where there is a need for resource conscious, fault tolerant computation without the overhead of Spark's memory constraints. In these such scenarios, Hadoop's disk-b"
Joshua R. - "Hadoop is better than PySpark when you are dealing with extremely large scale, batch oriented, non-iterative workloads where in-memory computing isn't feasible/ necessary, like log storage or ETL workflows that don't require high response times. It's also better in situations where the Hadoop ecosystem is already deeply embedded and where there is a need for resource conscious, fault tolerant computation without the overhead of Spark's memory constraints. In these such scenarios, Hadoop's disk-b"See full answer
Data Engineer
Data Pipeline Design
Asked at Databricks • 7 months ago
How would you handle scheduling dependencies between two nightly Jobs to ensure the second Job does not fail if the first Job runs longer than expected?
Data Engineer
Data Pipeline Design
Add answer I was asked this
Data Engineer
Data Pipeline Design
Asked at Databricks • 7 months ago
How would you handle a task in a nightly job that fails unexpectedly during 10 percent of the runs?
Data Engineer
Data Pipeline Design
Add answer I was asked this
Data Engineer
Data Pipeline Design
Asked at Databricks • 7 months ago
What is a Medallion Architecture?
Data Engineer
Data Pipeline Design
Add answer I was asked this
Data Engineer
Data Pipeline Design
Asked at Databricks • 7 months ago
When should you use Delta Live Tables over standard data pipelines built on Spark and Delta Lake?
Data Engineer
Data Pipeline Design
Add answer I was asked this
Data Engineer
Data Pipeline Design
Asked at Databricks • 7 months ago
What is delta lake?
Data Engineer
Data Pipeline Design
Add answer I was asked this
Data Engineer
Data Pipeline Design
Asked at Databricks • 7 months ago
When should you use a job cluster instead of an all-purpose cluster?
Data Engineer
Data Pipeline Design
Add answer I was asked this
Data Engineer
Data Pipeline Design

Showing 1-14 of 14

Interviewed recently?

Help improve our question database (and earn karma) by telling us about your experience

Trending companies

Data Engineer Interview Questions

Write a query to find the top 3 unique salaries in each department and list all employees who have those salaries.

Design a database schema for a fitness app.

Write a query to find all dates where the stadium had three or more consecutive days with attendance of 100 or more people.

Given a bookstore database schema, write SQL queries using joins and aggregations to answer questions about sales, inventory, and customer data.

Create geographic and demographic dashboards for weekly, monthly, and yearly analytics using order data (100M daily records for 5 years) and customer data (1B customers).

What's the difference between a data lakehouse and a data warehouse?

How would you handle slow query performance for a single-user SQL endpoint in Databricks, where all sequentially run queries are affected?

When is Hadoop better than PySpark?

How would you handle scheduling dependencies between two nightly Jobs to ensure the second Job does not fail if the first Job runs longer than expected?

How would you handle a task in a nightly job that fails unexpectedly during 10 percent of the runs?

What is a Medallion Architecture?

When should you use Delta Live Tables over standard data pipelines built on Spark and Delta Lake?

What is delta lake?

When should you use a job cluster instead of an all-purpose cluster?

Explore questions by company

Explore questions by role