Ready for a high-level look at the structure of system design interviews, the core concepts you'll be assessed on, and how to answer the most common questions?
By the end, you'll understand which parts of system design interviews you feel confident in and which areas need improvement.
We wrote this system design interview guide with the help of 50+ EM, SWE, and TPM technical interview coaches at startups and FAANG+ companies.
The system design interview evaluates your ability to solve a complex problem by designing a system or architecture in a semi-real-world setting.
Your solution doesn't need to be perfect.
Instead, you'll be evaluated on your ability to analyze a problem, design a blueprint, discuss multiple requirements, and weigh the pros and cons to develop a workable solution.
This interview is often used to determine the level at which a candidate will be hired. It is usually accompanied by an engineering manager or behavioral interview.
A hiring manager will assess your ability to make decisions in the face of uncertainty, your confidence in taking risks, and your capacity to adapt to new information.
A system design interview is usually composed of 5 steps:
A typical system design interview lasts about 45 minutes.
A good interviewer leaves a few minutes at the beginning for introductions and a couple at the end for questions.
This is just an estimate. Feel free to modify it based on your own personal style.
Time estimate: 8 minutes
Ask many questions and talk with your interviewer to understand the system's constraints.
Avoid going straight into designing without clarifying the problem.
Also, consider whether the system is being created from scratch.
Non-functional requirements
After you agree on the functional requirements with your interviewer, consider the non-functional requirements of the system design.
These may be related to business objectives or user experience.
Non-functional requirements include availability, consistency, speed, security, reliability, maintainability, and cost.
Ask these questions to the interviewer to understand more about non-functional requirements:
If there are many design constraints and some are more important than others, focus on the most critical ones.
Estimating Data
You can estimate the data volume roughly by performing some quick calculations.
For example, you can present queries per second (QPS), storage size, and bandwidth requirements to your interviewer.
This can help you pick components for your system. It will also give you an idea of scaling opportunities later.
As you estimate data, make some assumptions about user volume and typical user behavior.
Time estimate: 10 minutes
Now, explain how each part of the system will work together.
Start by designing APIs (Application Programming Interfaces).
APIs are like contracts defining how clients can access your system's resources or functionality. This is done with requests and responses.
Think about how a client interacts with our system. Perhaps a client wants to create/delete resources, or maybe they want to read/update an existing resource.
Each requirement should translate to one or more APIs. You can choose what type of APIs you want to use and why—such as:
Consider the request's parameters and the response type.
How will the web server and client communicate?
Next, think about how the client and web server will communicate.
There are several popular options to choose from:
Each has different communication directions and varying performance advantages and disadvantages.
Create a high-level system design diagram.
After designing the API and establishing a communication protocol, the next step is to create a high-level design diagram.
The diagram should serve as a blueprint for your design and highlight the most essential pieces to fulfill the functional requirements.
At this point, you don't need to go into too much detail about each service.
Your goal is to confirm that your design meets all functional requirements. Demonstrate the data and control flow for each requirement to your interviewer.
For instance, in the Twitter design example mentioned above, you could explain to your interviewer how the following features work:
Time estimate: 10 minutes
Next, examine system components and relationships in more detail.
The interviewer may prompt you to focus on a particular area but don't rely on them to drive the conversation.
How do your non-functional requirements impact your design?
Consider how non-functional requirements impact design choices.
System design questions usually have no unique "correct" answer.
Weighing the trade-offs between different choices to fulfill our system's functional and non-functional requirements is considered one of the most critical skill sets in a system design interview.
Time estimate: 10 minutes
After thoroughly examining the system components, take a step back and evaluate if the system can operate effectively under different conditions and has the flexibility to support future growth.
Consider these points:
Time Estimate: 4 minutes
This is the end of the interview, where you can summarize the requirements, justify decisions, suggest alternatives, and answer any questions.
Walk through your decisions, providing justification for each and discussing any space, time, and complexity tradeoffs.
Throughout the discussion, refer back to the requirements periodically.
During a system design interview, candidates often overlook or are not prepared for the evaluation of their leadership behaviors and skills.
In addition to assessing technical skills for designing at scale, the interviewer also tries to answer, "What is it like to work with you, and would they want you on their team?"
To convey the right signals during the interview, it's important to demonstrate leadership skills. You can do this by:
Demonstrating these skills during the interview is critical to receiving a positive evaluation.
The next part of this guide focuses on common system design interview questions.
You'll likely encounter variations of these questions at some point in your interview journey.
Next to each interview question is a list of key concepts you'll need to understand to build an effective solution. If you're unfamiliar with any of these concepts, refer to the resources in our system design interview course.
Some key considerations when designing ChatGPT include:
When designing ChatGPT, several functional requirements, such as creating, updating, viewing, and deleting conversations, need to be considered.
Additionally, rating a response by giving thumbs up or down can help train the model.
Text-based inputs in English are assumed, and inputs go through a sanitization phase to remove profanity and detect insults.
Non-functional requirements, such as latency, security, and scalability, are also essential. The latency for server responses can become lengthy due to extensive processing time on the back end.
Login flows and rate limitation procedures can prevent DDoS attempts aimed at the system, ensuring its safety.
The rate limiter must also be scalable to enable the system to host several users simultaneously without any issues.
A scalable database should be designed to handle this storage, with NoSQL being an ideal option.
The average message size is estimated at 100 bytes, meaning that 200 million messages per day translate to 20 GB per day and 7.3 terabytes annually, which amounts to 76 terabytes for ten years.
The system's high-level design should include a conversation service that manages user dialogue and the ChatGPT model, the system's core component.
Before a message is processed, it undergoes sanitization and analysis to ensure it meets the established standards.
The model's results are stored in a single conversation database. Finally, a thumbs-down rating indicates that the model needs to be retrained, and a risk model is designed to detect the legitimacy of user ratings.
The conversation service is a REST API that includes creating, deleting, viewing, and sending a message. Each conversation has a unique ID assigned to each message. The user can rate each message with a thumbs up or thumbs down.
The data is stored in a NoSQL database, where the conversation table contains various conversation IDs, while each message contains an ID, text, author, and parent.
ChatGPT uses a Transformer model, which predicts natural sequences of words.
The model is trained on internet data such as websites, books, and Wikipedia links to provide semantically meaningful and grammatically correct replies. The model can use multiple approaches such as top-K, greedy, or nucleus temperature to select the most accurate prediction.
A dataset of question-and-answer pairs is initially used to train the model.
However, since training the model on every possible question-and-answer combination is impossible, a reward model is designed to select the best responses. The reward model is also trained using reinforcement learning, where the chatbot is rewarded for giving appropriate responses and penalized for inappropriate ones.
The chatbot design is expected to support different input and output formats, including images, audio, and video.
Even if the model does not have a large dataset initially, it can still provide accurate responses in natural language. The reward model considers the emotion and tone of the question and answer and continuously trains itself to improve accuracy.
A system like YouTube must cater to both content creators and consumers of the content. Content creators should be able to upload videos in different formats from any device, and the system should take care of post-processing.
The viewing experience should be device-agnostic, allowing for a seamless viewing experience across any screen size or device.
High availability is crucial for non-functional requirements; eventual consistency can be targeted. The system needs to account for the ratio of reads to writes, which will heavily favor reads.
To facilitate content creation, the system should have an API for uploading metadata around the video and the video itself, with an open socket connection for large video transferring.
The system can use a queueing system for processing videos into different formats and resolutions, which will be stored in a relational database and blob storage like S3.
In addition, the system can shard the databases by users/creators for scalability.
Viewers will use a streaming service with a Content Delivery Network (CDN) that checks the database to validate permissions and then retrieves the video from the blob storage, storing it in the CDN for future requests.
The system could partition data by geography or genre, but it makes sense to shard creator videos.
To optimize the user experience, an adaptive bitrate system can adjust streaming quality based on the user's connection speed. Additionally, an in-memory cache can be implemented to prevent overload during high-traffic periods, and an invalidation strategy like LRU can be used to efficiently remove outdated videos from the CDN.
For analytics purposes, a stream or analytics system can be established.
One suggestion for improving the design is to add more fault tolerance measures, such as a master-safe slave configuration for the database.
Additionally, space and cost optimization should be considered, especially for a data-heavy system like a video streaming service.
This is a common Amazon system design interview question.
Consider dividing the system into public and internal endpoints and using a web or mobile app that interfaces with a back-end server.
To ensure high consistency and avoid double booking, use a strong consistency approach instead of an eventual consistency approach for a reservation system.
This approach would involve using read locks on replicas before writing to the Postgres database to keep the data as up-to-date as possible.
Additionally, sharding based on location can improve consistency, and read replicas and load balancing can help distribute load and maintain performance.
The data schema for the system should include a reservations table with foreign keys for garage ID, spot ID, start and end time, and payment status. The garage table should have an ID that is a primary key, a zip code, and rates based on vehicle size.
The spot table should include a serial primary key, a foreign key for garage ID, and a status that can be reserved, unavailable, or empty.
The optional users table may be added, including an ID that is the primary key, an email, first and last name, vehicle type, and user ID.
Remember the potential trade-offs, such as flexibility versus enums in the vehicle type column and load balancing versus maintaining consistency.
Consider using an existing third-party payment system instead of developing one in-house to save time and resources.
To design a minimum viable product (MVP) for a two-sided network on Twitter, focus on tweet creation, generating a timeline, and allowing users to follow others and interact with tweets.
These endpoints would be handled by multiple servers behind load balancers.
For timeline generation, active users' timelines are updated using a stack-like approach, adding daily tweets and considering user engagement for optimal timeline generation.
Influencers require a more dynamic approach, with data added to timelines on the fly when requested.
High availability is maintained through multiple servers and databases set up in a master-slave configuration.
Twitter must optimize for reads and high availability as a read-heavy system while ensuring influencers cannot bring the system down.
Additional non-functional requirements include eventual low latency and high availability. Interaction with multimedia and text content is similar, and the system would need algorithms that prioritize user engagement to limit the subset of tweets generating the most interaction.
Consider potential security issues such as DDoS attacks. Login flows and rate limitation procedures can prevent DDoS attempts aimed at the system, ensuring its safety.
When it comes to making this application scaleable, we can have one of our API servers read from a separate cache for our newsfeed. In doing so, we should also use a feed service to refresh our feed cache regularly.
Use blob storage services like Amazon S3 or Google Cloud to handle static content. This will give you fast response times and high availability.
Use a strong SQL database management system like PostgreSQL that works well with large-scale indexing.
Using a cache for frequently-accessed content will ensure that it's easy to get to and will give you faster response times. Using in-memory caching with Redis or Memcached for metadata could also improve performance.
Balance the trade-offs between consistency, availability, and partition tolerance. The CAP theorem can help you decide which factors are most important in your application. A load balancer and sharding can help balance the load on the database server.
Use a content delivery network (CDN) for low-latency delivery of globally-distributed media.
With an estimated 200 million users, optimizing for global impact is a key part of the design process.
To store compressed video files and user metadata, you need an always-available system that responds quickly and can grow to support more users. TikTok has one million users who use it every day, so the system must be able to handle that many people.
An API can be designed with endpoints such as "Upload Video" and "Get User Activity."
A relational database system like PostgreSQL can store and link structured user data objects.
To reduce the load on the database and improve latency, preloading a cache of the top ten videos for each user can be implemented.
To associate user activity, such as liking and following videos, with specific videos, a video UUID
and ID
field is required.
User activity can be stored in a database, and an API endpoint can be developed for a GET
request to return a list of the user's likes and followers.
To handle a 10x increase in traffic, a Content Delivery Network (CDN) can cache and route video content traffic to the closest node.
A load balancer can also be used for scaling deployments and performing zero downtime deployments.
A regional database sharding service can facilitate load distribution between databases and a read-only worker.
A pre-caching service can be implemented to manage GET
requests and preloading video content.
The design of Uber Eats requires attention to both functional and non-functional requirements.
First, define the needs of all stakeholders – restaurants, customers, and delivery people.
The top priority is to design a system that allows restaurants to add their information and customers to view and search for nearby restaurants based on delivery time and distance. This system requires eventual consistency to ensure the accuracy of restaurant information.
High availability, security, scalability, and latency are all critical non-functional requirements. The expected numbers of daily views, customers, and restaurants must be taken into account.
Successful design involves data modeling for restaurant and menu items, including geohashing for optimizing delivery routes.
To optimize proximity comparisons between different locations such as restaurants and customers, the system subdivides the world into smaller grids and maps them into binary values encoded using base 32.
This encoding system optimizes proximity matching between different locations. Data modeling involves the use of relational and noSQL databases for different data tables and optimizing the user experience for uploading new restaurant information.
Employing ElasticSearch service on Cassandra for sharding and scaling with separate databases for optimization purposes is crucial.
Multiple services such as a search service, viewing service, and restaurant service could be employed for efficiency, with the search service incorporating ElasticSearch and geohashing.
Isochrones and polygons are useful for estimating delivery times and identifying delivery areas. The viewing service could include caching frequently viewed menus and leveraging isochrones to estimate delivery times for customers.
The restaurant service could track all events using Kafka queues and implement an API for moderation, optimizing the system's efficiency.
Scalability, optimization, and customer experience are critical factors in UberEats' design.
A seamless system must be designed to cater to all stakeholders' needs. By considering all stakeholders and their requirements, an efficient system can be developed to provide a seamless experience.
The last part of this guide is a breakdown of the fundamental principles and concepts of designing scalable systems.
Network and web protocols are the rules and standards that govern how information is transmitted over the internet.
Refresh your knowledge of these key web protocols before your interview:
Load balancing is a technique used to distribute incoming traffic across multiple servers or resources to ensure that no single server becomes overloaded and unable to handle the traffic.
It allows a system to scale horizontally, meaning it can handle a larger workload by adding more servers or resources rather than relying on a single, powerful server.
Load balancers are essential to many modern technical systems and frequently come up in system design interviews.
Content Delivery Networks (CDNs) are a distributed network of servers that deliver content, such as web pages, web documents, images, and videos, to users based on their geographic location.
CDNs replicate content across a network of servers located in strategic locations around the world. When a user requests content, the CDN determines the user’s location and subsequently directs the request to the server that is closest to the user. In doing so, CDNs reduce latency and improve the overall user experience.
A distributed database system cannot guarantee all three of the following properties simultaneously:
Instead, a system must choose between consistency and availability in the face of network partitions.
Databases are a critical component of technical systems and will inevitably be involved in your system design interviews.
A database is a structured collection of data that is stored and accessed electronically.
There are many kinds of databases you can choose from when designing systems:
Understand the trade-offs between different database technologies and how to choose the best database for a particular application.
Caching stores frequently accessed data in a temporary storage location, typically in memory, to improve the performance of a system.
Caching is commonly used in system design because it can significantly improve the speed at which a system can retrieve data. There are several types of caching that are used often:
System design interview questions are similar to coding questions in that they are fundamentally technical.
However, they differ in a few key ways:
You will need knowledge and comfort with diverse technologies to effectively answer these interview questions.
Engineers, for example, will need to elaborate deeply on the systems within their areas of expertise.
However, management roles, such as TPM, need a much broader knowledge of the systems and technologies they use.
These are some of the most commonly asked questions around prepping for these tough interviews.
Yes and no. Amazon asks system design questions in their engineering interviews.
However, they don't ask these types of questions to freshers and recent graduates. System design questions are usually only asked in interviews for experienced positions (4-5 years of experience).
Yes, Google asks system design questions. They are asked during the technical phone interviews.
Your initial phone screens won't have any system design elements to them.
Instead, you'll be asked about algorithms and data structures. You'll encounter system design questions if you're advanced to the next interview round.
To pass the Google system design interview, focus on your whiteboarding skills.
System design interview questions are notoriously difficult to prepare for. Unlike algorithmic questions, they don't reduce down to a handful of prescribed patterns. Instead, they require years of technical knowledge and experience to answer well.
For junior engineers, this can be tricky. Even senior developers sometimes find themselves scratching their heads trying to understand a system.
The key to doing well in these types of interviews is to use your entire knowledge base to think about scalability and reliability in your answer.
The high-level design focuses on the problem you're trying to solve. The low-level design breaks down how you'll actually achieve it. That includes breaking down systems into their individual components and explaining the logic behind each step. System design interviews focus on both high-level and low-level design elements so that your interviewer can understand your entire thought process.
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account