Top 25 Python Data Science Interview Questions in 2024

Data Science
Exponent TeamExponent TeamLast updated

Data science interviews often include Python coding questions and statistical analysis.

These questions test your general Python coding skills, as well as your knowledge of popular data science Python libraries such as Pandas and NumPy

Below, we've compiled a list of the most important Python data science interview questions to help you ace your upcoming interviews.

Each question includes a breakdown of what interviewers expect in your answer and code snippets where applicable.

👋
This guide contains excerpts from Exponent's complete data science interview course and software engineering interview course created with data scientists and engineers from Spotify, Amazon, and Instacart.

Sneak peek:
- Watch a Tinder DS answer: Determine the sample size for an experiment.
- Watch a senior DS answer: What is a P-value?
- Practice yourself: Predict results from a fair coin flip.

This guide was written and compiled by Derrick Mwiti, a senior data scientist and course instructor.

Python Fundamentals

These Python data science interview questions will test your knowledge of the basics of Python.

Python is listed as an essential skill in data science job descriptions for companies like Microsoft, Google, Apple, and more.

1. Which is faster, Python lists or Numpy arrays? Why?

NumPy arrays are faster than Python lists.

NumPy arrays are specialized for numerical computation and efficient mathematical and statistical operations.

  • NumPy arrays contain homogeneous data types stored in contiguous memory.
  • Python lists are heterogeneous data types stored in non-contiguous memory.

Contiguous memory allocation is faster because it allocates consecutive blocks of memory to a process and leads to less memory waste.

2. What is the difference between map() and applymap()?

map and applymap are both used for elementwise operations.

However, map is applied to a series, while applymap is applied to a DataFrame.

3. Explain zip() and enumerate().

Given multiple iterables, zip yields tuples until the input is exhausted.

The number of tuples is equivalent to the number of iterables passed. However, it's dependent on the shortest iterable.

Python
list1 = [1, 2, 3, 4, 5] list2 = ['cow', 'goat', 'hen'] list3 = ['the', 'quick', 'brown', 'fox'] list(zip(list1, list2, list3)) [(1, 'cow', 'the'), (2, 'goat', 'quick'), (3, 'hen', 'brown')]

enumerate creates a tuple for the iterables with the first value as its index and the next being the actual value of the item.

This makes it possible to access the position of an item in a list and its position.

Python
e = enumerate(list3) list(e) [(0, 'the'), (1, 'quick'), (2, 'brown'), (3, 'fox')]

4. What is a lambda function?

A lambda function is an anonymous function declared without the def keyword.

A lambda function has only one expression but can have multiple arguments. It can make code more concise but less readable.

Python
def myfunc(n): return lambda a, b, c : a + b + c * n my_func = myfunc(3) print(my_func(5, 6, 2)) # Output: 17

5. How do map, reduce, and filter functions work?

  • map: Applies a function to each item in an iterable.
Python
def myfunc(n): return n**2 x = map(myfunc, (1, 2, 3)) list(x) # [1, 4, 9]
  • filter: Removes items that don’t return true and outputs a new iterable.
Python
names = ["Derrick", "Dennis", "Joe"] def myFunc(x): if x.startswith("D"): return True else: return False final_names = filter(myFunc, names) for x in final_names: print(x) # Output: Derrick, Dennis
  • reduce: Applies a function from left to right, reducing the iterable to a single value.
Python
from functools import reduce reduce(lambda x, y: x + y, [1, 2, 3, 4, 5]) 15

6. What is the difference between del(), clear(), remove(), and pop()?

  • del: Deletes objects, lists, parts of a list, and variables.
Python
my_list = [1, "two", 3, "four"] del my_list NameError: name 'my_list' is not defined
  • clear: Removes all items in a list.
Python
my_list = [1, "two", 3, "four"] my_list.clear() my_list
  • remove: Deletes the first occurrence of a value.
Python
my_list = [1, "two", 3, "four"] my_list.remove(1) my_list.remove("four") my_list ['two', 3]
  • pop: Removes the item at the specified position.
Python
my_list = [1, "two", 3, "four"] my_list.pop(1) my_list.pop(2) my_list # Output: [1, 3]

7. What is a Python module? How is it different from a package?

A module is a file containing Python definitions and statements. A package is a collection of Python modules.

Organizing code in a modular format is better than dumping all functions in a single file.

A package can contain a file named __init__.py responsible for executing some initialization code.

8. How is exception handling achieved in Python?

An exception is an error that occurs when your program cannot handle a specific situation, such as trying to open a non-existent file.

Python
with open('somefile.txt') as file: read_data = file.read() FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'

Exception handling is important because exceptions stop the execution of the program.

You can handle exceptions using try statements.

Python
try: with open('somefile.txt') as file: read_data = file.read() except FileNotFoundError as error: print(f"There is an error: {error}") There is an error: [Errno 2] No such file or directory: 'somefile.txt'

9. What is the difference between return and yield keywords?

return: Terminates a function and returns a value to the caller, stopping the program's execution.

Python
def tryexponent(): return "www.tryexponent.com" print("Trying exponent!") # This will not be executed print(tryexponent()) # Output: www.tryexponent.com
  • yield: Returns an iterator from a function without stopping the program's execution.
Python
def gen_func(x): for i in range(x): yield i generator = gen_func(10) print(next(generator)) # Output: 0 print(next(generator)) # Output: 1 for x in generator: print(x) # Output: 2, 3, 4, 5, 6, 7, 8, 9

10. What are global and local variables in Python?

  • A local variable is defined inside a function or class and can only be accessed within that scope.
  • A global variable is defined outside functions or classes and can be accessed from anywhere in the program.

11. Write a function to check if a given string is a palindrome.

A palindrome is a word that reads the same backward as forwards, such as "racecar" or "mom."

Python
def is_palindrome(word): return word == word[::-1] print(is_palindrome("madam")) # Output: True

12. What are decorators in Python? How are they used?

A decorator is a design pattern that allows for the modification or extension of a Python object without modifying it. Decorators enhance or modify the behavior of the functions to which they are applied.

This is possible because functions are first-class citizens in Python.

They can be

  • returned from a function,
  • passed as an argument,
  • and assigned to a variable.
Python
def titlecase_decorator(function): def wrapper(): func = function() make_titlecase = func.title() return make_titlecase return wrapper @titlecase_decorator def make_title(): return 'learning python decorators' print(make_title()) # Output: 'Learning Python Decorators'

13. What are args and kwargs in Python?

  • args is used for passing non-keyword arguments.
  • kwargs is used to pass keyword arguments.
Python
def add(*args): result = 0 for value in args: result += value return result print(add(10, 25, 27)) 62 def add(**kwargs): result = 0 for arg in kwargs.values(): result += arg print("The answer is {}".format(result)) add(no_one=2, no_two=3) The answer is 5

14. What’s the difference between shallow copy and deep copy?

  • A deep copy copies an object to another memory location. Changes made to the new copy don’t affect the original copy since they both have different memory addresses.
  • A shallow copy creates a reference to the original variable, meaning any change made to the shallow copy also affects the original copy.
ℹ️
This interview question was asked at Microsoft. "Explain stack and heap memory allocation."

Data Science with Python

These are interview questions that specifically test your ability to use Python to solve data science problems.

15. What is the difference between indexing and slicing in NumPy?

  • Indexing accesses elements at a certain index in a NumPy array.
  • Slicing involves accessing a subset of the array within a range.

Example of indexing:

Python
import numpy as np arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) print('3rd element on 1st row: ', arr[0, 2]) # Output: 3rd element on 1st row: 3

Example of slicing:

Python
import numpy as np matrix = np.arange(1, 17).reshape(4, 4) print(matrix[2:4, 2:4]) # [start_row:end_row, start_column:end_column] # Output: [[11, 12], [15, 16]]
ℹ️
This interview question was asked at Apple. "Implement batch normalization using NumPy."

16. What is the difference between .iloc and .loc?

  • .loc is used for label indexing.
  • .iloc is used for integer indexing.

17. What is the difference between merge, join, and concatenate?

  • merge is used for merging data frames based on a certain column using the intersection of all elements.
  • join is used for joining data frames based on a unique index. A left join uses exclusive IDs from the left table, meaning that there will be NaNs for values that don’t exist on the right table.
  • concatenate joins Pandas objects along a particular axis, for example by rows or columns.

18. Explain list comprehension and dict comprehension.

List comprehension provides a simple interface for creating new lists from an iterable.

Python
fruits = ["boy", "bowtie", "cow", "goat", "boat"] newlist = [x for x in fruits if "b" in x] print(newlist) # Output: ['boy', 'bowtie', 'boat']

Dictionary comprehension provides a simple interface for creating new dictionaries from an iterable.

Python
dict1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5} triple_dict1 = {k: v * 3 for k, v in dict1.items() if v > 2} print(triple_dict1) # Output: {'c': 9, 'd': 12, 'e': 15}

19. What is Regex? Can you use Regex to validate an email address?

Regular Expression (RegEx) contains special and ordinary characters for matching operations in Python. 

The re.match function can be used for this exercise.

Python
import re email = '[email protected]' def validate_email(email): pattern = '^([a-z0-9_.-]+)@([a-z0-9-]+)\.([a-z0-9-.])+$' search = re.match(pattern, email) if search: return f"{search.group()} is okay" else: return f"{email} is not valid" print(validate_email(email)) # Output: '[email protected] is okay'
ℹ️
Practice answering this interview question, "Build a Regex Parser."

20. Discuss the pros and cons of random forests in classification and regression.

A Random Forest is a collection of decision trees. It selects the class having the most votes from all the trees in the forest.

You may encounter questions about random forests in your machine learning interviews.

Classification:

  1. Select random samples from the dataset with replacement.
  2. Build a decision tree for each sample.
  3. Obtain a prediction from each tree.
  4. Vote.
  5. Select the prediction with the most votes.

Regression:

  1. Select random samples from the dataset with replacement.
  2. Build a decision tree for each sample.
  3. Obtain an average from each tree.

Advantages:

  • Controls overfitting by fitting several decision trees.
  • Higher accuracy than a single decision tree.
  • Runs efficiently on large datasets.
  • Provides feature importance.
  • Can be used for both classification and regression problems.

21. What is the difference between lists, NumPy arrays, and sets in Python? When should you consider one over the other? 

Lists, arrays, and sets are data structures for storing data in Python.

  • Lists: Denoted by [], store a sequence of data in multiple formats. For example, you can store integers, floats, and strings in the same list. List items can be accessed using their index location and manipulated.
  • NumPy arrays: Denoted by array(), store items of the same data type only. Very efficient for numerical computation compared to lists.
  • Sets: Denoted by {}, allow storage of multiple data types but items in a set cannot be updated. Sets also don’t allow for duplicates.

Considerations:

  • Use NumPy arrays for numerical computation due to their speed.
  • Use sets for removing duplicates from a list and when you don’t expect the values in the data to change.

Advanced Python and Best Practices

22. Explain the most common Python string functions.

The top Python string functions include: 

  • split: Splits a string.
Python
string.split()
  • strip: Removes trailing or leading characters from a string, such as spaces and commas.
Python
string.strip(',')
  • upper: Converts a string to uppercase.
Python
string.upper()
  • capitalize: Capitalizes a string.
Python
string.capitalize()
  • count: Counts how many times a word appears in a string.
Python
string.count('the')

23. Discuss Python unit testing with an example.

The Python unittest module provides the tools needed for running tests. Creating tests ensures that the code runs as expected and prevents accidental bugs when modifying code.

This is done by writing test cases that assert different scenarios, for example, checking that the answer returned by a function is greater than zero. 

Python
def name_as_uppercase(name): return name.upper() def check_balance(amount_paid, loan): return amount_paid - loan import unittest class TestCases(unittest.TestCase): def test_upper(self): new_name = name_as_uppercase('derrick') self.assertEqual('derrick'.upper(), new_name) def test_balance(self): balance = check_balance(20, 10) self.assertGreaterEqual(balance, 0) if __name__ == '__main__': unittest.main()

24. Discuss different types of variables in Python OOP.

  • Class variables: Defined inside the class and accessible by all instances of the class.
Python
class School(): language = "English" # class attribute def __init__(self, name, location): #__init__() sets the initial state of the object self.name = name # instance attribute self.location = location # instance attribute
  • Instance variables: Accessible by individual class instances.
Python
class School(): def __init__(self, name, location): #__init__() sets the initial state of the object self.name = name # instance attribute self.location = location # instance attribute
  • Local variables: Defined within methods and only available within those methods.

25. Differentiate the types of methods in Python OOP.

  • Class methods: Used for changing the class state and only access class variables. They take the first parameter as cls.
Python
class School(): language = "English" # class attribute @classmethod def chat_motto(cls, motto): return f"The motto is: '{motto}', class is '{cls}'"
  • Instance methods: Can access both class and instance variables. They take the first argument as self.
Python
class School(): def chat_motto(self, motto): return f"The motto is: '{motto}'"
  • Static methods: Don’t have access to class or instance variables and don’t take a specific first parameter such as cls or self.
Python
class School(): @staticmethod def chat_motto(motto): return f"The motto is: '{motto}'"

Python Data Science Interview Tips

Hopefully, these Python questions have given you a glimpse into what to expect in your data science interviews.

Good luck with your upcoming interview!

Learn everything you need to ace your data science interviews.

Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.

Create your free account