Skip to main content

Handle Missing Data

MediumPremium

You are given a dataset containing some missing values. Your task is to handle these missing values according to the instructions provided. The dataset is as follows:

student_idnamegpacredits_completedyear
1Alice3.515Freshman
2Bob3.7NaNSophomore
3CharlieNaN45Junior
4David3.250NaN
5Eva3.970Senior
6FrankNaN20Freshman
7Grace3.835Sophomore
8HannahNaN35Junior
9Ian3.4NaNSenior
10Jack3.160NaN
11Kelly3.625Sophomore
12LeoNaNNaNSophomore

Practice all the options below to get a feel for how you may handle missing data during your coding interviews.

After completing each option, click the "Reset" button on the code editor to clear your changes and start fresh before working on the next option.

Option 1: Drop all rows with missing data

The interviewer determines that all rows with missing data are useless. Drop all rows with missing data. Your result should look like this:

student_idnamegpacredits_completedyear
1Alice3.515Freshman
5Eva3.970Senior
7Grace3.835Sophomore
11Kelly3.625Sophomore

Option 2: Drop all rows with missing GPA

The interviewer decides that only the rows with missing gpa values are worthless. Drop such rows. Your result should look like this:

student_idnamegpacredits_completedyear
1Alice3.515Freshman
2Bob3.7NaNSophomore
4David3.250NaN
5Eva3.970Senior
7Grace3.835Sophomore
9Ian3.4NaNSenior
10Jack3.160NaN
11Kelly3.625Sophomore

Option 3: Replace missing values

Instead of removing rows, the interviewer decides to handle missing values as follows:

  • For missing gpa values, fill them with the mean GPA.
  • For missing credits_completed values, replace them with the mean credits_completed value based on the student's year. For example, if the student is a sophomore, use the average credits_completed of all sophomores.
  • For missing year values, assign the most frequently occurring year to these missing values.

Your result should look like this:

student_idnamegpacredits_completedyear
1Alice3.515Freshman
2Bob3.730Sophomore
3Charlie3.52545Junior
4David3.250Sophomore
5Eva3.970Senior
6Frank3.52520Freshman
7Grace3.835Sophomore
8Hannah3.52535Junior
9Ian3.470Senior
10Jack3.160Sophomore
11Kelly3.625Sophomore
12Leo3.52530Sophomore
  • Mean gpa calculated as: (3.5 + 3.7 + 3.2 + 3.9 + 3.8 + 3.4 + 3.1 + 3.6) / 8 = 3.525
  • Mean credits_completed for Sophomores: (35 + 25) / 2 = 30

Option 4: Interpolate

Unfortunately, we won't be interpolating on this dataset. Check out the coding lesson on interpolation to learn more!