Do you have an upcoming machine learning system design interview?
These open-ended questions are often considered among the hardest to answer, as they require combining machine learning knowledge into a real-world setting.
In the 45-minute ML system design interview, you'll design a complete system covering data pre-processing, model training and evaluation, and deployment.
Expect to build systems like:
- Spotify recommendations
- Filtering offensive content on Youtube
- Predicting Netflix user watch time
- Personalized LinkedIn job recommendations
These questions evaluate your ability to solve business problems with machine-learning solutions.
To succeed, you must consider efficiency, monitoring, harm prevention, and inference infrastructure development.
A framework can help you to manage your time effectively and communicate your ideas clearly under time constraints.
- Step 1: Define the problem. Identify the core ML task and ask clarifying questions to determine the appropriate requirements and tradeoffs. (8 minutes)
- Step 2: Design the data processing pipeline. Illustrate how you’ll collect and process your data to maintain a high-quality dataset. (8 minutes)
- Step 3: Create a model architecture. Come up with a suitable model architecture that would address the needs of the core ML task identified in Step 1 (8 minutes).
- Step 4: Train and evaluate the model. Select a model and explain how you’ll train and evaluate it. (8 minutes)
- Step 5: Deploy the model. Determine how you’ll deploy the model, how it will be served, and how to monitor it. (8 minutes)
- Step 6: Wrap up. Summarize your solution and present additional considerations you would address with more time. (5 minutes)
Step 1: Define the problem
Begin your ML system design by defining the problem, setting interview parameters, and aligning with the interviewer.
This step gauges your ability to scope problems and identify system requirements.
Start your interview by asking about necessary system requirements and checking in with your interviewer on your assumptions.
Specify the model and datasets needed for your system.
- Recommendation: Ranks samples by similarity. Uses collaborative or user-based filtering model and large dataset with (user, item, rating) rows.
- Regression: Predicts a continuous scalar value. Uses a regularized linear regression and a dataset mapping features to a scalar value.
- Classification: Categorizes input into various categories. Uses logistic regression and a dataset mapping features to a category.
- Generation: Outputs new samples conditioned on an input. Uses a neural network and a dataset associating input and output space samples.
- Ranking: Predicts an ordering of elements. Uses a regression model for ranking score and a dataset mapping from (element, element set) to goodness of element.
Next, establish the system's goals. Identify key requirements and potential tradeoffs.
Consider the following:
- Accuracy and performance: Define the system's minimum accuracy and efficiency. Consider if accuracy can be compromised for performance during traffic peaks.
- Traffic/bandwidth: Estimate the number of simultaneous users and average traffic. Assess traffic distribution and expected Daily Average Users (DAUs).
- Data sources and requirements: Identify available data sources and potential issues like noise or missing values, toxic content, and data privacy or copyright restrictions.
- Computational resources and constraints: Determine available computational resources for model training and serving and the possibility of workload parallelization.
Step 2: Design a data processing pipeline
Designing a data pipeline shows your interviewer that you understand the importance of high-quality data, not just high-quality algorithms.
Show your interviewer that you’re thinking about data quality.
- What kind of data is needed? Numbers, text, images, multimodal, etc.
- How will you collect the data? Programmatic labeling, synthetic data augmentation, human annotation, etc.
- Do you need to do any kind of feature engineering? For example, would it be helpful to pre-compute some features, such as categorizing people’s ages into bins of “adolescent,” “adult,” etc.?
- What kind of data pre-processing do you need to do? Tokenization, normalization, encoding categorical features in numerical form, removing low-quality data, imputing missing values, synthetically augmenting data, etc.
- Are there privacy concerns involved with the kind of data you’re using? If so, can you remove identifying information or apply filtering or pre-processing techniques that induce k-anonymity (for sufficiently large k)?
- How do you ensure that no data contamination is occurring? For example, if your data segments are generated by the same process (the same spammer creates multiple spam emails in the same spam classification dataset), then ensure that those segments are in the same split of your data.
Step 3: Choose a model architecture
Once you've got your data, select a suitable ML model and train it. In this step, you need to justify your model choice considering:
- Type of learning problem: What models fit your interview problem’s core ML learning issue?
- Use case: Will this model be used for predictions by another system or interacted with directly by users? Does it require frequent re-training or personalization?
- Simplicity: What's the simplest model that provides enough accuracy?
- Practical constraints: Consider any safety, privacy, storage, and business constraints.
Identify suitable model architectures that meet the system requirements, like latency or memory optimization.
For a classification task, potential architectures could be logistic regression, a complex neural network, or a search-optimized two-tower architecture.
Select a model that best addresses the problem, matches available data, and optimizes efficiency, accuracy, sensitivity, and interpretability tradeoffs.
For instance, you might choose a simpler neural network model to improve training performance, even if it affects latency at inference time.
Step 4: Train and evaluate the model
Select a model and decide on an optimizer algorithm, metrics for monitoring, and hyperparameters tuning.
Your training plan might change depending on your hardware availability, paralleling training jobs, and distributing data and model parameters across multiple devices.
Certain models may allow fine-tuning of pre-trained models instead of training from scratch.
Present your evaluation plan to your interviewer, considering where your model will be used and how an incorrect prediction could impact users.
Evaluation standards include:
- Accuracy: F1, precision, recall, confusion matrices, etc.
- Bias: Group fairness, etc.
- Calibration: Aligning the model’s predictions with the probability of correctness.
- Sensitivity/Robustness: Evaluating how minor changes affect a model’s prediction.
- Comparisons against Baselines: Comparing with the simplest model, a random baseline, or a human baseline.
Discuss the pros and cons of your chosen evaluation metrics, such as how precision@k compares to ndcg@k in a ranking task.
Understanding the tradeoffs among evaluation metrics shows your ability to optimize a model for its purpose.
Step 5: Deploy the model
Understanding how components fit into the overall picture is crucial. Address these three key points:
- Deployment Timing: Choose appropriate evaluation metrics and testing strategies for your model on production data, like A/B tests, canary deployment, feature flags, or shadow deployment.
- Model Serving: Decide on the hardware (remote or on the edge), optimize and compile the model (NVCC, XLA), and plan for varying user traffic patterns.
- Monitoring: Post-production monitoring is vital for ML systems. Constantly improve performance and benchmark models. Decide on your ground truth dataset, indicators for model performance regression, and troubleshooting tools.
Step 6: Wrap up
In the last few minutes of the interview, review the problem scope, data processing pipeline, and how you would train, evaluate, and deploy the model.
If there’s time, discuss some of the main bottlenecks and tradeoffs of your overall system design.
- Why did you decide that those bottlenecks or tradeoffs would be acceptable?
- How would you scale the system for more data or inference/training requests?
- How would you adjust the model and/or data processing in the future to handle distribution shifts?
Ending with a high-level overview and additional considerations shows the interviewer you have a comprehensive understanding of the system.
You’re also demonstrating your technical design skills by proactively identifying extra components and tradeoffs you’d consider in a less time-constrained setting.
Once you’ve wrapped up, check in with your interviewer to see if they have follow-up questions.
Top ML System Design Interview Questions
You should be prepared to answer a mix of behavioral, coding, conceptual, and system design questions in your ML interviews.
These are some common ML system design questions you can practice with.
- Design a hotel booking chatbot.
- Design a podcast search engine.
- Design a personalized Uber.
- Design a customer support chatbot.
- Design type-ahead search for Stack Overflow.
- Design autocomplete for text messages.
- Design YouTube Search.
- Design visual search for Pinterest.
- Recommend similar artists on Spotify.
- Recommend restaurants on Google Maps.
- Recommend similar products on Amazon.
- Recommend similar videos on YouTube.
- Design a type-ahead search for Netflix.
- Design Spotify’s Discover Weekly.
- Design Uber Shared Rides.
- Recommend trending topics on X (Twitter).
- Design Google’s related searches.
- Design a “people you may know” system.
- Recommend similar homes on Airbnb.
- Recommend similar apartments on Zillow.
- Recommend similar movies on Netflix.
- Design a product recommendation system.
- Recommend similar jobs on LinkedIn.
- Rank Quora answers.
- Filter offensive online comments.
- Filter restricted products from eBay.
- Design a sentiment analysis model.
- Design an automatic recycling system.
- Design a shape-detection system.
- Measure user engagement on Netflix.
- Detect trigger words in an audio clip.
- Filter fake or duplicate schools on Facebook.
- Assess the difficulty of language tests on Duolingo.
- Design blurring for Google Street View.
- How would you reduce wrong orders on Doordash?
- Classify social media posts by topic.
- Predict the optimal time for commercials on Hulu.
- Filter offensive content on TikTok.
- Design an image classifier.
- Design a fraud-detection system for Stripe.
- Detect the language of a text input.
- Build a dynamic pricing system.
- Predict user behavior changes after product updates.
- Estimate birthdays for a large group.
- Design a framework for evaluating ad rankings.
- Design a personalized newsfeed.
- Design a product ranking system for Amazon.
- Seeking the “right” answer: In most instances, there are no strictly right or wrong responses. Some answers are better justified than others. Your interviewer will expect you to thoroughly justify your decisions, explaining why you chose your design over other options.
- Relying on state-of-the-art (SotA) models: It can be tempting to look at ML benchmark leaderboards to identify current SotA models for a task. However, this approach often falls short in practice. SotA models typically require more resources to train and run, and they are usually evaluated only on academic benchmarks rather than in real-world scenarios.
- Overcomplicating the model: Many issues can arise when training models. Therefore, it's best to start with a simple, low-capacity version. Once you have a basic solution that would work with clean data, you can enhance the model to handle additional complexities such as messy data and corner cases. Starting with a basic model also allows time for the interviewer to pinpoint the aspects of the ML design they want you to concentrate on. Adapting to these cues demonstrates your ability to collaborate and incorporate feedback into your design.
- Overlooking model evaluation and validation: Clearly explain how you'll initially validate a model learned from some data, which should include both quantitative and qualitative analysis. Also, discuss how ongoing validation will be conducted, such as using a metrics dashboard.