Rank Salary by Department
Given the schema shown below, write a solution to rank each employee based on their salary (highest salary first) within their respective departments. The results should be ordered by department name.
employees projects +---------------+---------+ +---------------+---------+ | id | int |<----+ +->| id | int | | first_name | varchar | | | | title | varchar | | last_name | varchar | | | | start_date | date | | salary | int | | | | end_date | date | | department_id | int |--+ | | | budget | int | +---------------+---------+ | | | +---------------+---------+ | | | departments | | | employees_projects +---------------+---------+ | | | +---------------+---------+ | id | int |<-+ | +--| project_id | int | | name | varchar | +-----| employee_id | int | +---------------+---------+ +---------------+---------+
Your function should return a result in the following format:
department_name | rank | employee_id | first_name | last_name | salary ----------------+------+-------------+------------+-----------+-------- varchar | int | int | varchar | varchar | int
First, we merge the DataFrames. The pd.merge function is used to combine employees with departments based on the department_id and id columns.
Next, we rank employees according to their salary. The groupby method groups the merged DataFrame by department_name, and rank is calculated for each employee’s salary within their department. The method='dense' ensures that ranks are consecutive integers.
Lastly, we sort the results. The result is sorted by department_name and rank to meet the specified ordering requirement.
Pythondef rank_salary_by_department(departments: pd.DataFrame, employees: pd.DataFrame) -> pd.DataFrame:
# Merge employees with departments on department_id
merged_df = pd.merge(employees, departments, left_on='department_id', right_on='id')
# Drop the redundant 'id' column from departments after merge
merged_df = merged_df.drop(columns=['id_y'])
# Rename columns for clarity
merged_df = merged_df.rename(columns={
'id_x': 'employee_id',
'name': 'department_name'
})
# Rank employees based on salary within their department
merged_df['rank'] = merged_df.groupby('department_name')['salary'].rank(ascending=False, method='dense').astype(int)
# Select and order the columns as required
result_df = merged_df[['department_name', 'rank', 'employee_id', 'first_name', 'last_name', 'salary']]
# Sort the results by department name and then by rank
result_df = result_df.sort_values(by=['department_name', 'rank'])
return result_df
The question is incomplete --- the code only passes if you return the data frame sorted by BOTH department name AND rank. While in the problem description, it mentions to only rank by department name: "The results should be ordered by department name."
Not a big difference I know, but students shouldn't need to look into the solution to get the necessary knowledge to answer the question.