Data science interviews often include Python coding questions and statistical analysis.
These questions test your general Python coding skills, as well as your knowledge of popular data science Python libraries such as Pandas and NumPy.
Below, we've compiled a list of the most important Python data science interview questions to help you ace your upcoming interviews.
Each question includes a breakdown of what interviewers expect in your answer and code snippets where applicable.
This guide was written and compiled by Derrick Mwiti, a senior data scientist and course instructor.
These Python data science interview questions will test your knowledge of the basics of Python.
Python is listed as an essential skill in data science job descriptions for companies like Microsoft, Google, Apple, and more.
NumPy arrays are faster than Python lists.
NumPy arrays are specialized for numerical computation and efficient mathematical and statistical operations.
Contiguous memory allocation is faster because it allocates consecutive blocks of memory to a process and leads to less memory waste.
map
and applymap
are both used for elementwise operations.
However, map
is applied to a series, while applymap
is applied to a DataFrame.
Given multiple iterables, zip
yields tuples until the input is exhausted.
The number of tuples is equivalent to the number of iterables passed. However, it's dependent on the shortest iterable.
list1 = [1, 2, 3, 4, 5]
list2 = ['cow', 'goat', 'hen']
list3 = ['the', 'quick', 'brown', 'fox']
list(zip(list1, list2, list3))
[(1, 'cow', 'the'), (2, 'goat', 'quick'), (3, 'hen', 'brown')]
enumerate
creates a tuple for the iterables with the first value as its index and the next being the actual value of the item.
This makes it possible to access the position of an item in a list and its position.
e = enumerate(list3)
list(e)
[(0, 'the'), (1, 'quick'), (2, 'brown'), (3, 'fox')]
A lambda function is an anonymous function declared without the def
keyword.
A lambda function has only one expression but can have multiple arguments. It can make code more concise but less readable.
def myfunc(n):
return lambda a, b, c : a + b + c * n
my_func = myfunc(3)
print(my_func(5, 6, 2))
# Output: 17
map
: Applies a function to each item in an iterable.def myfunc(n):
return n**2
x = map(myfunc, (1, 2, 3))
list(x)
# [1, 4, 9]
filter
: Removes items that don’t return true and outputs a new iterable.names = ["Derrick", "Dennis", "Joe"]
def myFunc(x):
if x.startswith("D"):
return True
else:
return False
final_names = filter(myFunc, names)
for x in final_names:
print(x)
# Output: Derrick, Dennis
reduce
: Applies a function from left to right, reducing the iterable to a single value.from functools import reduce
reduce(lambda x, y: x + y, [1, 2, 3, 4, 5])
15
del
: Deletes objects, lists, parts of a list, and variables.my_list = [1, "two", 3, "four"]
del my_list
NameError: name 'my_list' is not defined
clear
: Removes all items in a list.my_list = [1, "two", 3, "four"]
my_list.clear()
my_list
remove
: Deletes the first occurrence of a value.my_list = [1, "two", 3, "four"]
my_list.remove(1)
my_list.remove("four")
my_list
['two', 3]
pop
: Removes the item at the specified position.my_list = [1, "two", 3, "four"]
my_list.pop(1)
my_list.pop(2)
my_list
# Output: [1, 3]
A module is a file containing Python definitions and statements. A package is a collection of Python modules.
Organizing code in a modular format is better than dumping all functions in a single file.
A package can contain a file named __init__.py
responsible for executing some initialization code.
An exception is an error that occurs when your program cannot handle a specific situation, such as trying to open a non-existent file.
with open('somefile.txt') as file:
read_data = file.read()
FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'
Exception handling is important because exceptions stop the execution of the program.
You can handle exceptions using try
statements.
try:
with open('somefile.txt') as file:
read_data = file.read()
except FileNotFoundError as error:
print(f"There is an error: {error}")
There is an error: [Errno 2] No such file or directory: 'somefile.txt'
return
: Terminates a function and returns a value to the caller, stopping the program's execution.
def tryexponent():
return "www.tryexponent.com"
print("Trying exponent!") # This will not be executed
print(tryexponent())
# Output: www.tryexponent.com
yield
: Returns an iterator from a function without stopping the program's execution.def gen_func(x):
for i in range(x):
yield i
generator = gen_func(10)
print(next(generator))
# Output: 0
print(next(generator))
# Output: 1
for x in generator:
print(x)
# Output: 2, 3, 4, 5, 6, 7, 8, 9
A palindrome is a word that reads the same backward as forwards, such as "racecar" or "mom."
def is_palindrome(word):
return word == word[::-1]
print(is_palindrome("madam"))
# Output: True
A decorator is a design pattern that allows for the modification or extension of a Python object without modifying it. Decorators enhance or modify the behavior of the functions to which they are applied.
This is possible because functions are first-class citizens in Python.
They can be
def titlecase_decorator(function):
def wrapper():
func = function()
make_titlecase = func.title()
return make_titlecase
return wrapper
@titlecase_decorator
def make_title():
return 'learning python decorators'
print(make_title())
# Output: 'Learning Python Decorators'
args
is used for passing non-keyword arguments.kwargs
is used to pass keyword arguments.def add(*args):
result = 0
for value in args:
result += value
return result
print(add(10, 25, 27))
62
def add(**kwargs):
result = 0
for arg in kwargs.values():
result += arg
print("The answer is {}".format(result))
add(no_one=2, no_two=3)
The answer is 5
These are interview questions that specifically test your ability to use Python to solve data science problems.
Example of indexing:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print('3rd element on 1st row: ', arr[0, 2])
# Output: 3rd element on 1st row: 3
Example of slicing:
import numpy as np
matrix = np.arange(1, 17).reshape(4, 4)
print(matrix[2:4, 2:4]) # [start_row:end_row, start_column:end_column]
# Output: [[11, 12], [15, 16]]
.loc
is used for label indexing..iloc
is used for integer indexing.merge
is used for merging data frames based on a certain column using the intersection of all elements.join
is used for joining data frames based on a unique index. A left join uses exclusive IDs from the left table, meaning that there will be NaN
s for values that don’t exist on the right table.concatenate
joins Pandas objects along a particular axis, for example by rows or columns.List comprehension provides a simple interface for creating new lists from an iterable.
fruits = ["boy", "bowtie", "cow", "goat", "boat"]
newlist = [x for x in fruits if "b" in x]
print(newlist)
# Output: ['boy', 'bowtie', 'boat']
Dictionary comprehension provides a simple interface for creating new dictionaries from an iterable.
dict1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
triple_dict1 = {k: v * 3 for k, v in dict1.items() if v > 2}
print(triple_dict1)
# Output: {'c': 9, 'd': 12, 'e': 15}
Regular Expression (RegEx) contains special and ordinary characters for matching operations in Python.
The re.match
function can be used for this exercise.
import re
email = '[email protected]'
def validate_email(email):
pattern = '^([a-z0-9_.-]+)@([a-z0-9-]+)\.([a-z0-9-.])+$'
search = re.match(pattern, email)
if search:
return f"{search.group()} is okay"
else:
return f"{email} is not valid"
print(validate_email(email))
# Output: '[email protected] is okay'
A Random Forest is a collection of decision trees. It selects the class having the most votes from all the trees in the forest.
You may encounter questions about random forests in your machine learning interviews.
Lists, arrays, and sets are data structures for storing data in Python.
[]
, store a sequence of data in multiple formats. For example, you can store integers, floats, and strings in the same list. List items can be accessed using their index location and manipulated.array()
, store items of the same data type only. Very efficient for numerical computation compared to lists.{}
, allow storage of multiple data types but items in a set cannot be updated. Sets also don’t allow for duplicates.The top Python string functions include:
split
: Splits a string.string.split()
strip
: Removes trailing or leading characters from a string, such as spaces and commas.string.strip(',')
upper
: Converts a string to uppercase.string.upper()
capitalize
: Capitalizes a string.string.capitalize()
count
: Counts how many times a word appears in a string.string.count('the')
The Python unittest
module provides the tools needed for running tests. Creating tests ensures that the code runs as expected and prevents accidental bugs when modifying code.
This is done by writing test cases that assert different scenarios, for example, checking that the answer returned by a function is greater than zero.
def name_as_uppercase(name):
return name.upper()
def check_balance(amount_paid, loan):
return amount_paid - loan
import unittest
class TestCases(unittest.TestCase):
def test_upper(self):
new_name = name_as_uppercase('derrick')
self.assertEqual('derrick'.upper(), new_name)
def test_balance(self):
balance = check_balance(20, 10)
self.assertGreaterEqual(balance, 0)
if __name__ == '__main__':
unittest.main()
class School():
language = "English" # class attribute
def __init__(self, name, location): #__init__() sets the initial state of the object
self.name = name # instance attribute
self.location = location # instance attribute
class School():
def __init__(self, name, location): #__init__() sets the initial state of the object
self.name = name # instance attribute
self.location = location # instance attribute
cls
.class School():
language = "English" # class attribute
@classmethod
def chat_motto(cls, motto):
return f"The motto is: '{motto}', class is '{cls}'"
self
.class School():
def chat_motto(self, motto):
return f"The motto is: '{motto}'"
cls
or self
.class School():
@staticmethod
def chat_motto(motto):
return f"The motto is: '{motto}'"
Hopefully, these Python questions have given you a glimpse into what to expect in your data science interviews.
Good luck with your upcoming interview!
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account