Ready for a high-level look at the structure of system design interviews, the core concepts you'll be assessed on, and how to answer the most common questions?
By the end, you'll understand which parts of system design interviews you feel confident in and which areas need improvement.
We wrote this system design interview guide with feedback from 50+ EM, SWE, and TPM technical interview coaches at startups and FAANG+ companies like Microsoft, Amazon, Meta, Google, Netflix, Dropbox, and Stripe.
It was reviewed and edited by these senior engineers and managers:
It also includes contributions by Anthony Pellegrino.
The system design interview evaluates your ability to solve a complex problem by designing a system or architecture in a semi-real-world setting.
Your solution doesn't need to be perfect.
Instead, you'll be evaluated on your ability to:
A hiring manager will assess your ability to make decisions in the face of uncertainty, your confidence in taking risks, and your capacity to adapt to new information.
These are the most common questions you'll encounter during your technical interviews. They're from our database of the most frequently asked system design interview questions for EMs, TPMs, and SWEs.
Social media platforms allow users to post photos and videos, follow and unfollow each other, like and comment on posts, search for content, and get personalized newsfeeds.
Chat applications enable users to send instant messages to each other.
Some popular applications like WeChat and WhatsApp have billions of users worldwide on hundreds of devices.
Chat apps require support for one-on-one communication, group threads, and sending text, images, videos, and files.
Streaming platforms let users stream content on demand, get personalized recommendations, create multiple user profiles, and search an extensive content library.
Similarly, cloud storage applications efficiently handle storing and retrieving large amounts of data.
Real-time collaboration with virtual tools has become commonplace as working remotely gains popularity.
To build these systems, consider real-time collaboration, version control, access control, and notifications.
Generating recommendations based on a user's current location can be essential for finding restaurants, hotels, and points of interest while on the go.
Location-based systems enable users to search for locations, obtain directions, estimate distances and travel times, and receive contextual search results.
Machine learning systems rely on ML models to process data and function correctly.
You must know how to send data to a model, receive and process a response, and continuously improve the model based on feedback about the quality of its output.
Payment systems must keep track of inventory, handle transactions, issue receipts, and prevent orders if a product is out of stock or unavailable.
When matching players in online multiplayer video games, consider factors such as latency, player skill levels, and match settings.
A system design interview is usually composed of 5 steps:
A typical system design interview lasts between 45 to 60 minutes.
Good interviewers will leave a few minutes in the beginning for introductions and a couple at the end for questions.
This is just an estimate. Modify it based on your own personal interviewing style.
Time estimate: 8 minutes
Start by gathering more information from your interviewer about the system's constraints.
Use a combination of context clues and direct questions to answer:
Once you've identified and confirmed the functional requirements with your interviewer, consider the non-functional requirements of the system design.
These may be related to business objectives or user experience.
Non-functional requirements include things like:
Consider these questions to identify the non-functional requirements:
If there are many design constraints and some are more important than others, focus on the most critical ones.
For example, if designing a Twitter timeline, focus on Tweet posting and timeline generation services instead of user registration or how to follow another user.
Requirement | Question |
---|---|
Performance | How fast is the system? |
Scalability | How will the system respond to increased demand? |
Reliability | What is the system’s uptime? |
Resilience | How will the system recover if it fails? |
Security | How are the system and data protected? |
Usability | How do users interact with the system? |
Maintainability | How will you troubleshoot the system? |
Modifiability | Can users customize features? Can developers change the code? |
Localization | Will the system handle currencies and languages? |
You can estimate the data volume roughly by performing some quick calculations.
For example, you can present queries per second (QPS), storage size, and bandwidth requirements to your interviewer.
This can help you pick components for your system. It will also give you an idea of scaling opportunities later.
As you estimate data, make some assumptions about user volume and typical user behavior.
Time estimate: 10 minutes
Next, explain how each part of the system will work together.
Start by designing APIs (Application Programming Interfaces).
APIs define how clients can access your system's resources or functionality via requests and responses.
Consider how clients interact with the system and the types of data they're passing through.
Clients may want to create/delete resources or read/update existing ones.
Each system requirement should translate to one or more APIs.
At this step, you should choose what type of APIs you want to use and why—such as:
Consider the request's parameters and the response type.
Next, think about how the client and web server will communicate.
There are several popular options to choose from:
Each has different communication directions and varying performance advantages and disadvantages.
Pros | Cons | |
---|---|---|
Ajax Polling | Easy to implement, works with all browsers | High server load, high latency |
Long Polling | Low latency, less server load | High server load, not supported by all browsers |
Websockets | Real-time communication | May require more complex server setup |
Server-sent Events | Efficient, low latency | Unidirectional communication, not supported by all browsers |
Once you've designed the API and established a communication protocol, determine the core database data models.
This includes:
After designing the API, establishing a communication protocol, and building a rough data model, the next step is to create a high-level design diagram.
The diagram should serve as a blueprint for your design.
It highlights the most essential pieces to fulfill the functional requirements.
Don't need to go into too much detail about each service yet.
Your goal at this step is to confirm that your design meets all functional requirements.
Demonstrate the data and control flow for each requirement to your interviewer.
In the Twitter design example above, you could explain to your interviewer how the following features work:
Time estimate: 10 minutes
Next, examine system components and relationships in more detail.
The interviewer may prompt you to focus on a particular area but don't rely on them to drive the conversation.
Consider how non-functional requirements impact design choices.
System design questions have no "correct" answer. Every question can be answered in multiple ways.
The most important skill of a system design interview is your ability to weigh trade-offs as you consider functional and non-functional requirements.
Time estimate: 10 minutes
After thoroughly examining the system components, take a step back.
Are there any bottlenecks in this system? How well does it scale?
Evaluate if the system can operate effectively under different conditions and has the flexibility to support future growth.
Consider these points:
Decoupling backend services is crucial for achieving scalability and reliability in system design.
By breaking down processes and implementing queuing mechanisms to manage traffic, systems can be optimized for high performance at scale.
An example of event-driven architecture is Pramp, a peer-to-peer mock-interview tool for software engineers.
On Pramp, registering a user is handled as an asynchronous event, involving multiple services working in tandem.
Message Queues (MQs) play a pivotal role in enabling orderly and efficient message transmission to a single receiver.
On the other hand, Publish-Subscribe (Pub/Sub) systems excel at broadcasting information to multiple subscribers simultaneously.
Here are examples of synchronous, asynchronous, and pub/sub-messaging queues:
To talk about bottlenecks, follow this structure.
Focus on the 2 or 3 most important limitations to keep your answer concise.
Time Estimate: 4 minutes
This is the end of the interview. You can summarize the requirements, justify decisions, suggest alternatives, and answer any questions.
Walk through your decisions, providing justification for each and discussing any space, time, and complexity tradeoffs.
Throughout the discussion, refer back to the requirements periodically.
System design interviews help determine the level at which a candidate will be hired.
For junior engineers and new graduates, the focus on system design interviews is lesser. Junior candidates are expected to know the basics but not every detailed concept.
For instance, junior candidates don't need to know when to use NGINX or AWS' native load balancer. They only need to know that a load balancer is necessary.
However, for senior, staff, and lead candidates, having an in-depth understanding of system design and various trade-offs becomes vital.
Having more than one system design interview for higher-level roles is common.
During a system design interview, candidates often overlook or are not prepared for the evaluation of their leadership behaviors and skills.
In addition to assessing technical skills for designing at scale, the interviewer also tries to answer, "What is it like to work with you, and would they want you on their team?"
You can demonstrate leadership skills in an interview by:
Demonstrating these skills during the interview is critical to receiving a positive evaluation.
Some key considerations when designing ChatGPT include:
When designing ChatGPT, several functional requirements, such as creating, updating, viewing, and deleting conversations, need to be considered.
Additionally, rating a response by giving thumbs up or down can help train the model.
Text-based inputs in English are assumed, and inputs and outputs go through a sanitization phase to remove profanity and detect insults.
The latency for server responses can become lengthy due to extensive processing time on the back end.
Login flows and rate limitations can prevent DDoS attempts.
The rate limiter must also be scalable. This will help to host lots of users simultaneously without any issues.
A scalable database should be designed to handle this storage, with NoSQL being an ideal option.
The average message size is estimated at 100 bytes, meaning that 200 million messages per day translate to 20 GB per day and 7.3 terabytes annually, which amounts to 76 terabytes for ten years.
The system's high-level design should include a conversation service. This manages user inputs and the ChatGPT model, the system's core component.
Before a message is processed, it undergoes sanitization and analysis. This ensures it meets the standards required for processing.
The model's results are stored in a single conversation database.
Finally, a thumbs-down rating indicates that the model needs to be retrained, and a risk model is designed to detect the legitimacy of user ratings.
The conversation service is a REST API that includes creating, deleting, viewing, and sending a message. Each conversation has a unique ID assigned to each message. The user can rate each message with a thumbs up or thumbs down.
The data is stored in a NoSQL database, where the conversation table contains various conversation IDs, while each message contains an ID, text, author, and parent.
ChatGPT uses a Transformer model, which predicts natural sequences of words.
The model is trained on internet data such as websites, books, and Wikipedia links to provide semantically meaningful and grammatically correct replies. The model can use multiple approaches such as top-K, greedy, or nucleus temperature to select the most accurate prediction.
A dataset of question-and-answer pairs is initially used to train the model.
However, since training the model on every possible question-and-answer combination is impossible, a reward model is designed to select the best responses. The reward model is also trained using reinforcement learning, where the chatbot is rewarded for giving appropriate responses and penalized for inappropriate ones.
The chatbot design is expected to support different input and output formats, including images, audio, and video.
Even if the model does not have a large dataset initially, it can still provide accurate responses in natural language. The reward model considers the emotion and tone of the question and answer and continuously trains itself to improve accuracy.
A system like YouTube must cater to both content creators and consumers of the content. Content creators should be able to upload videos in different formats from any device, and the system should take care of post-processing.
The viewing experience should be device-agnostic, allowing for a seamless viewing experience across any screen size or device.
High availability is crucial for non-functional requirements; eventual consistency can be targeted. The system needs to account for the ratio of reads to writes, which will heavily favor reads.
To facilitate content creation, the system should have an API for uploading metadata around the video and the video itself, with an open socket connection for large video transferring.
The system can use a queueing system for processing videos into different formats and resolutions, which will be stored in a relational database and blob storage like S3.
In addition, the system can shard the databases by users/creators for scalability.
Viewers will use a streaming service with a Content Delivery Network (CDN) that checks the database to validate permissions and then retrieves the video from the blob storage, storing it in the CDN for future requests.
The system could partition data by geography or genre, but it makes sense to shard creator videos.
To optimize the user experience, an adaptive bitrate system can adjust streaming quality based on the user's connection speed. Additionally, an in-memory cache can be implemented to prevent overload during high-traffic periods, and an invalidation strategy like LRU can be used to efficiently remove outdated videos from the CDN.
For analytics purposes, a stream or analytics system can be established.
One suggestion for improving the design is to add more fault tolerance measures, such as a master-safe slave configuration for the database.
Additionally, space and cost optimization should be considered, especially for a data-heavy system like a video streaming service.
This is a common Amazon system design interview question.
Consider dividing the system into public and internal endpoints and using a web or mobile app that interfaces with a back-end server.
To ensure high consistency and avoid double booking, use a strong consistency approach instead of an eventual consistency approach for a reservation system.
This approach would involve using read locks on replicas before writing to the Postgres database to keep the data as up-to-date as possible.
Additionally, sharding based on location can improve consistency, and read replicas and load balancing can help distribute load and maintain performance.
The data schema for the system should include a reservations table with foreign keys for garage ID, spot ID, start and end time, and payment status. The garage table should have an ID that is a primary key, a zip code, and rates based on vehicle size.
The spot table should include a serial primary key, a foreign key for garage ID, and a status that can be reserved, unavailable, or empty.
The optional users table may be added, including an ID that is the primary key, an email, first and last name, vehicle type, and user ID.
Remember the potential trade-offs, such as flexibility versus enums in the vehicle type column and load balancing versus maintaining consistency.
Consider using an existing third-party payment system instead of developing one in-house to save time and resources.
To design a minimum viable product (MVP) for a two-sided network on Twitter, focus on tweet creation, generating a timeline, and allowing users to follow others and interact with tweets.
These endpoints would be handled by multiple servers behind load balancers.
For timeline generation, active users' timelines are updated using a stack-like approach, adding daily tweets and considering user engagement for optimal timeline generation.
Influencers require a more dynamic approach, with data added to timelines on the fly when requested.
High availability is maintained through multiple servers and databases set up in a master-slave configuration.
Twitter must optimize for reads and high availability as a read-heavy system while ensuring influencers cannot bring the system down.
Additional non-functional requirements include eventual low latency and high availability.
Interaction with multimedia and text content is similar, and the system would need algorithms that prioritize user engagement to limit the subset of tweets generating the most interaction.
Consider potential security issues such as DDoS attacks. Login flows and rate limitation procedures can prevent DDoS attempts aimed at the system, ensuring its safety.
When making this application scalable, have an API server read from a separate cache for the newsfeed.
You should also use a feed service to refresh the feed cache regularly.
Use blob storage services like Amazon S3 or Google Cloud to handle static content. This will give you fast response times and high availability.
Use a strong SQL database management system like PostgreSQL that works well with large-scale indexing.
Using a cache for frequently-accessed content will ensure that it's easy to get to and will give you faster response times. Using in-memory caching with Redis or Memcached for metadata could also improve performance.
Balance the trade-offs between consistency, availability, and partition tolerance. The CAP theorem can help you decide which factors are most important in your application. A load balancer and sharding can help balance the load on the database server.
Use a content delivery network (CDN) for low-latency delivery of globally-distributed media.
With an estimated 200 million users, optimizing for global impact is a key part of the design process.
To store compressed video files and user metadata, you need an always-available system that responds quickly and can grow to support more users. TikTok has one million users who use it every day, so the system must be able to handle that many people.
An API can be designed with endpoints such as "Upload Video" and "Get User Activity."
A relational database system like PostgreSQL can store and link structured user data objects.
To reduce the load on the database and improve latency, preloading a cache of the top ten videos for each user can be implemented.
To associate user activity, such as liking and following videos, with specific videos, a video UUID
and ID
field is required.
User activity can be stored in a database, and an API endpoint can be developed for a GET
request to return a list of the user's likes and followers.
To handle a 10x increase in traffic, a Content Delivery Network (CDN) can cache and route video content traffic to the closest node.
A load balancer can also be used for scaling deployments and performing zero downtime deployments.
A regional database sharding service can facilitate load distribution between databases and a read-only worker.
A pre-caching service can be implemented to manage GET
requests and preloading video content.
The design of Uber Eats requires attention to both functional and non-functional requirements.
First, define the needs of all stakeholders – restaurants, customers, and delivery people.
The top priority is to design a system that allows restaurants to add their information and customers to view and search for nearby restaurants based on delivery time and distance. This system requires eventual consistency to ensure the accuracy of restaurant information.
High availability, security, scalability, and latency are all critical non-functional requirements. The expected numbers of daily views, customers, and restaurants must be taken into account.
Successful design involves data modeling for restaurant and menu items, including geohashing for optimizing delivery routes.
To optimize proximity comparisons between different locations such as restaurants and customers, the system subdivides the world into smaller grids and maps them into binary values encoded using base 32.
This encoding system optimizes proximity matching between different locations.
Data modeling involves using relational and NoSQL databases for different data tables and optimizing the user experience for uploading new restaurant information.
Employing ElasticSearch service on Cassandra for sharding and scaling with separate databases for optimization purposes is crucial.
Multiple services such as a search service, viewing service, and restaurant service could be employed for efficiency, with the search service incorporating ElasticSearch and geohashing.
Isochrones and polygons help estimate delivery times and identify delivery areas.
The viewing service could include caching frequently viewed menus and leveraging isochrones to estimate customer delivery times.
The restaurant service could track all events using Kafka queues and implement an API for moderation, optimizing the system's efficiency.
Scalability, optimization, and customer experience are critical factors in Uber Eats' design.
A seamless system must be designed to cater to all stakeholders' needs.
Considering all stakeholders' requirements, an efficient system can be developed to provide a seamless experience.
"I recently completed the Google engineering manager interview loop. These were regular system design interviews with different engineers and teams.
To prepare for the interviews, I watched a lot of mock interviews from Exponent. I also read some books and practiced answering system design questions in Google Docs. I practiced writing solutions for 3-4 systems, including Google Drive, Instagram, a hotel booking system, Google Maps, an analytics system, and blob storage.
A coding interview round was also evaluated by an L6 engineering manager. They advised me to spend time understanding which database to choose.
I recommend checking out Alex Xu's system design database table and use cases on Twitter. Spend an evening learning about all the different use cases for these database types. Google likes to ask detailed questions about database selections.
Additionally, I reviewed all of the databases used by Google, including Bigtable, Spanner, Firestore, and BigQuery. This gave me a few more points with the interviewer since I approached the problems with their internal tech, not just AWS or Azure. This was probably overkill, but it helped me feel more prepared."
During an Amazon system design interview, a big focus will be on behavioral questions based on Amazon's Leadership Principles.
However, the interview will also evaluate your technical, functional job fit, specifically in system design.
Focus on the big picture rather than becoming an expert on the specific system they want you to create.
Whether you come from a FinTech or HealthTech background, Amazon will likely ask you to design an Amazon-type product. This could be Alexa or Amazon Prime.
Focus on the fundamentals that create a cohesive experience across different layers required for a complex environment to work.
During the interview, you may be asked to optimize your solution or test different parameters to see how you adjust the scope and handle unforeseen circumstances.
The last part of this guide is a breakdown of the fundamental principles and concepts of designing scalable systems.
Network and web protocols are the rules and standards that govern how information is transmitted over the internet.
Refresh your knowledge of these key web protocols before your interview:
Typically, data is stored in tables with rows and columns (RDBMS) in a 1-to-1 relationship.
A normalization process stores data with 1-to-N or N-to-N relationships in separate tables joined by Foreign Keys.
This ensures that the data in these tables are consistent and can be joined for a complete view of the data.
As data size increases, traditional database systems face CPU, memory, or disk usage bottlenecks that require high-end and expensive hardware.
However, even with top-quality hardware, most successful modern applications require more data than a traditional RDBMS can handle.
Sometimes, data is split into large tables with horizontal data partitions. Each partition contains a subset of the whole table. And each partition is stored on a separate database server.
This process is called sharding. Each partition is called a shard.
The technique used to partition data often depends on the data's structure.
Some common sharding techniques include:
This technique partitions data based on the user's location, such as their continent of origin or a similarly large area (e.g., "East US," "West US").
This technique allows users to be routed to the node closest to their location, reducing latency.
Bottlenecks: There may not be an even distribution of users in the various geographical areas.
Range-based sharding divides the data based on the ranges of the key value.
For example, selecting the first letter of the user's first name as the shard key divides the data into 26 buckets (assuming English names).
Bottlenecks: This simplifies partition computation but can lead to uneven splits across data partitions. In this example, more users have names starting with the letter A than Z.
This technique uses a hashing algorithm to generate a hash from the key value. It then computes the partition using the hash value.
A good hash algorithm distributes data evenly across partitions, reducing the risk of hotspots.
Bottlenecks: It can assign related rows to different partitions, so the server cannot enhance performance by predicting and pre-loading future queries.
Consider the pros and cons of sharding techniques before suggesting one in an interview.
Load balancing is a technique used to distribute incoming traffic across multiple servers or resources to ensure that no single server becomes overloaded and unable to handle the traffic.
It allows a system to scale horizontally, meaning it can handle a larger workload by adding more servers or resources rather than relying on a single, powerful server.
Load balancers are essential to many modern technical systems and frequently come up in system design interviews.
Load balancers can use two types of algorithms: static and dynamic.
Content Delivery Networks (CDNs) are a distributed network of servers that deliver content, such as web pages, web documents, images, and videos, to users based on their geographic location.
CDNs replicate content across a network of servers located in strategic locations around the world. When a user requests content, the CDN determines the user’s location and subsequently directs the request to the server that is closest to the user. In doing so, CDNs reduce latency and improve the overall user experience.
A distributed database system cannot guarantee all three of the following properties simultaneously:
Instead, a system must choose between consistency and availability in the face of network partitions.
Databases are a critical component of technical systems and will inevitably be involved in your system design interviews.
A database is a structured collection of data that is stored and accessed electronically.
There are many kinds of databases you can choose from when designing systems:
Understand the trade-offs between different database technologies and how to choose the best database for a particular application.
Caching stores frequently accessed data in a temporary storage location, typically in memory, to improve the performance of a system.
Caching is commonly used in system design because it can significantly improve the speed at which a system can retrieve data. There are several types of caching that are used often:
System design interview questions are similar to coding questions in that they are fundamentally technical.
However, they differ in a few key ways:
You will need knowledge and comfort with diverse technologies to effectively answer these interview questions.
Engineers, for example, will need to elaborate deeply on the systems within their areas of expertise.
However, management roles, such as TPM, need a much broader knowledge of the systems and technologies they use.
These are some of the most commonly asked questions around prepping for these tough interviews.
Yes and no. Amazon asks system design questions in their engineering interviews.
However, they don't ask these types of questions to freshers and recent graduates. System design questions are usually only asked in interviews for experienced positions (4-5 years of experience).
Yes, Google asks system design questions. They are asked during the technical phone interviews.
Your initial phone screens won't have any system design elements to them.
Instead, you'll be asked about algorithms and data structures. You'll encounter system design questions if you're advanced to the next interview round.
To pass the Google system design interview, focus on your whiteboarding skills.
System design interview questions are notoriously difficult to prepare for. Unlike algorithmic questions, they don't reduce down to a handful of prescribed patterns. Instead, they require years of technical knowledge and experience to answer well.
For junior engineers, this can be tricky. Even senior developers sometimes find themselves scratching their heads trying to understand a system.
The key to doing well in these types of interviews is to use your entire knowledge base to think about scalability and reliability in your answer.
The high-level design focuses on the problem you're trying to solve. The low-level design breaks down how you'll actually achieve it. That includes breaking down systems into their individual components and explaining the logic behind each step. System design interviews focus on both high-level and low-level design elements so that your interviewer can understand your entire thought process.
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account