Improving Students
HardPremium
Given the schema below, write a function to find the students that have consistently improved academically throughout the years.
Your output should be in the following format:
- The columns 2021, 2022 and 2023 should be populated with their respective yearly GPAs
average_gpashould be rounded to the nearest 2 decimal places
To solve this problem, we recommend the use of a pivot table.
First, we can use the pivot method to transform the data such that each row is a student, and each column represents the GPA for a specific year.
Dataset:
Pivot table:
Next, we can use the diff method to compute the difference in GPA year over year. The diff_df table looks like this:
We can then check if the differences are positive year-on-year and filter out students who meet that condition.
Finally, we can compute the average GPA for each student that has consistently improved.
Pythondef find_improving_students(transcript: pd.DataFrame) -> pd.DataFrame:
# Create a pivot table with student_id as index, years as columns, and yearly_gpa as values
pivot_df = transcript.pivot(index='student_id', columns='year', values='yearly_gpa')
# Sort columns to ensure the years are in order
pivot_df = pivot_df.sort_index(axis=1)
# Calculate the difference between yearly GPAs
diff_df = pivot_df.diff(axis=1)
diff_df = diff_df.fillna(pivot_df) # fill NaN with the GPA values
# Calculate if GPA is consistently improving
improving_mask = (diff_df > 0).all(axis=1)
# Filter students who have consistently improved
improving_students = pivot_df[improving_mask]
# Calculate the average GPA for each student and round to 2 decimal places
results_df = improving_students.copy()
results_df['average_gpa'] = improving_students.mean(axis=1).round(2)
return results_df.reset_index()
The solution produces the same result as the 'prescribed solution' yet it does not get accepted In the test results section
transcript['year'] = transcript['year'].astype(str)
df = pd.pivot_table(data = transcript, index = 'student_id', columns = 'year', values = 'yearly_gpa', aggfunc = 'mean').reset_index()
df = df[(df['2021'] < df['2022']) & (df['2022'] < df['2023'])]
df['average_gpa'] = df[['2021', '2022', '2023']].mean(axis=1).round(2)
return df