Average Distance Between Cities
MediumPremium
You are given a table with varying distances between different pairs of cities recorded by different car GPS systems. The table has the following columns:
- origin: Name of the origin city in the pair
- destination: Name of the destination city in the pair
- distance: Distance between
originanddestination
Write a function that calculates the average distance between each pair of cities and returns a new table with the columns city_pair and average_distance.
city_pair | average_distance ----------+------------------------ varchar | float (rounded to 2 d.p)
- The
city_paircolumn should contain the pair of cities in alphabetical order separated by a hyphen (e.g., “CityA-CityB”). average_distanceshould be rounded to the nearest 2 decimal places- The results should be ordered according to ascending
average_distance
Example
Based on this table, the average distance between CityA and CityB = (100 + 110 + 130)/3 = 113.33 (round to the nearest 2 decimal places). Your solution should thus return:
We can solve this problem in the following steps:
- Normalizing City Pairs: We create a new column
city_pairwhere we combine the origin and destination cities in alphabetical order. This ensures that pairs likeCityAandCityBare always represented asCityA-CityB, regardless of the order in the original data. - Grouping and Calculating Average: We group the DataFrame by the
city_paircolumn and calculate the mean of the distance column for each group. This step gives us the average distance for each pair of cities. - Rounding and Sorting: We round the average distances to two decimal places for precision. Finally, we sort the results by
average_distancein ascending order to meet the requirements of the problem statement.
Pythondef find_average_distance(gps_data: pd.DataFrame) -> pd.DataFrame:
# Create a new column with city pairs in alphabetical order
gps_data['city_pair'] = gps_data.apply(
lambda row: '-'.join(sorted([row['origin'], row['destination']])), axis=1
)
# Group by the city pair and calculate the average distance
result = gps_data.groupby('city_pair')['distance'].mean().reset_index()
# Round the average distances to two decimal places
result['distance'] = result['distance'].round(2)
# Rename columns to match the expected output
result.columns = ['city_pair', 'average_distance']
# Sort by average_distance in ascending order
result = result.sort_values(by='average_distance').reset_index(drop=True)
return result
select (case when scity < dcity then scity else dcity end) as city1, (case when scity < dcity then dcity else scity end) as city2, avg(distance) distance from city group by 1,2