Amazon Order Status
Amazon is a global e-commerce company that allows vendors to sell their products online to customers. Customers can order products and track their orders' status, such as 'Pending', 'Shipped', 'Delivered', etc.
You're given a table, orders, with the following columns:
order_id(integer): a unique identifier for each orderorder_date(date): the date the order status was updatedstatus(string): the status of the order, e.g., 'Pending', 'Shipped', 'Delivered', etc.
Write a SQL query that returns a table with the order_id, status, start_date, and end_date for each status period of a particular order. If a status is the first for that order, then the end_date should be NULL."
-
Common Table Expression (CTE) -
StatusChanges: We employ a CTE namedStatusChangesto prepare an intermediary result. CTEs allow us to create temporary result sets which can simplify the structure of our main query. -
Window Functions (
LEADandLAG):- We use
LEADto peek at theorder_dateof the next row in the sequence (essentially the date of the next status). This helps us determine theend_datefor a given status. - Similarly,
LAGis used to look at the previous row'sstatus, allowing us to compare the current status with its predecessor and determine if there's been a change.
Both
LEADandLAGare Window Functions, which means they operate over a set of table rows that are somehow related to the current row. This relation is established through thePARTITION BYandORDER BYclauses. - We use
-
Filtering Strategy in
WHEREClause: We use a condition that checks if the current status differs from the previous status (status != prev_status) or if the previous status is simply nonexistent (prev_status IS NULL). This ensures that we only capture rows representing genuine status changes or the initial status for an order. This step is crucial to distill the essence of our objective.
WITH StatusChanges AS (
SELECT
order_id,
order_date,
status,
LAG(status) OVER(PARTITION BY order_id ORDER BY order_date) AS prev_status
FROM orders
),
FilteredChanges AS (
SELECT
order_id,
order_date,
status
FROM StatusChanges
WHERE status != prev_status OR prev_status IS NULL
)
SELECT
order_id,
status,
order_date AS start_date,
LEAD(order_date) OVER(PARTITION BY order_id ORDER BY order_date) AS end_date
FROM FilteredChanges
ORDER BY order_id, order_date;
-- The text of the task is a bit confusing. If the status is repeated several -- times, then in the end you should show as start_date the date of the first -- occurrence, and in end_date the date of the last occurrence of this status, -- and not the date of the beginning of the next status
with t1 as (select order_id, status, order_date as start_date, lead(order_date) over (partition by order_id order by order_date) as end_date, ifnull(lag(status) over (partition by order_id order by order_date),'no') as prev_status from orders order by order_id, order_date) select order_id, status, start_date, end_date from t1 where status!=prev_status order by order_id, start_date