Top Product Lines
MediumPremium
Given the schema shown below, write a solution to find the product lines that have the highest sales volume, ranked in descending order (e.g. highest sales volume = #1, lowest sales volume = #10).
Sales volume is not equal to sales revenue. Sales volume measures the quantity of products or services sold over a certain period.
users products +---------------+---------+ +-----------------+---------+ +--| id | int | +-----| id | int | | | first_name | varchar | | | name | varchar | | | last_name | varchar | | +->| product_line_id | date | | | user_city | int | | | | stock | int | | | email | int | | | +-----------------+---------+ | +---------------+---------+ | | | | | | transactions | | product_lines | +---------------+---------+ | | +--------+--------+ | | id | int |<----+ +--| id | int | +---->| customer_id | int | | name | varchar| | product_id | int | +--------+--------+ | amount | int | | currency_code | varchar | | date | date | +---------------+---------+ exchange_rate +----------------------+---------+ | id | int | | source_currency_code | varchar | | target_currency_code | varchar | | rate | numeric | +----------------------+---------+
Your solution should return the product line name and the total number of products sold:
rank | name | volume -----+---------+------- int | varchar | int
We can follow the following steps to rank the product lines by sales volume:
- Join
transactionswithproductsto get product line IDs. - Aggregate sales volume by product line ID.
- Join with
product_linesto get product line names. We can rename columns for clarity. - Rank product lines based on sales volume in descending order. Don't forget to check the data type of
rank. - Sort and re-order columns before returning the result
Pythondef rank_product_lines(transactions: pd.DataFrame,
products: pd.DataFrame,
product_lines: pd.DataFrame) -> pd.DataFrame:
# Join transactions with products to get product line IDs
transactions_with_products = pd.merge(transactions, products, left_on='product_id', right_on='id')
# Aggregate sales volume by product line ID
sales_volume_by_line = transactions_with_products.groupby('product_line_id')['amount'].sum().reset_index()
# Join with product_lines to get product line names
sales_volume_by_line = pd.merge(sales_volume_by_line, product_lines, left_on='product_line_id', right_on='id')
# Rename columns for clarity
sales_volume_by_line = sales_volume_by_line[['name', 'amount']]
sales_volume_by_line.columns = ['name', 'volume']
# Rank product lines based on sales volume in descending order
sales_volume_by_line['rank'] = sales_volume_by_line['volume'].rank(method='dense', ascending=False).astype(int)
# Sort by rank
ranked_product_lines = sales_volume_by_line.sort_values(by='rank').reset_index(drop=True)
# Reorder columns to have 'rank' first
ranked_product_lines = ranked_product_lines[['rank', 'name', 'volume']]
return ranked_product_lines
Schema is wrong - id from product is mapped to id from transactions, id from product should point to product_id in transcations table