Anthropic AI Safety Fellow Interview Guide

The Anthropic AI Safety Fellow interview is calibrated to full-time research engineer hiring standards. The loop bars LLM use across every live stage despite being at an AI safety lab, runs two-part coding rounds that most candidates only finish halfway, and filters aggressively at the online assessment.

The fellowship is a 4-month, full-pay research position that can convert to a full-time role.

This guide breaks down each stage of the Anthropic AI Safety Fellow interview process, what interviewers look for, and how to prepare with real example questions, actionable tips, and resources.

Anthropic AI Safety Fellow interview process

The Anthropic AI Safety Fellow interview runs five stages that prioritize implementation speed, live clarification instincts, and research fluency.

Expect a scored CodeSignal assessment up front, a live coding screen with an Anthropic engineer, and a final loop that pairs a Colab-based LLM coding round with a short open-ended research brainstorm.

Here's an example of what the process can look like:

Application screen: A written application covering motivation, research interests, team fit, and three required references. Only candidates who clear this screen are invited to the online assessment.
CodeSignal online assessment: A 90-minute systems coding round across four progressively harder stages, scored out of 1,000
Coding screen with an Anthropic engineer: A live, 90-minute CodeSignal round with a logic-heavy implementation prompt that requires careful clarification before coding
Final loop: A 55-minute "Prompting and Engineering with LLMs" coding round in a Google Colab notebook, followed by a 15-minute open-ended research brainstorm with a potential mentor
Reference checks: Three references submitted with your application, contacted by Anthropic at any point during the loop, including live calls during the final round

Anthropic moves quickly between rounds. Once you advance, expect to schedule the next interview within 3 days.

Anthropic operates in designated workspaces in Berkeley and London, with mentors visiting in person, but the fellowship is remote-friendly. Work authorization is required in the US, UK, or Canada; Anthropic doesn't sponsor visas.

Application screen

The Anthropic AI Safety Fellow application screen is a written submission that gates the rest of the loop. Only candidates who clear it move on to the CodeSignal online assessment.

The application asks for substantive written responses on your motivation, research interests, and team fit, alongside a resume, optional code samples and publications, and three required references.

Reviewers look for:

Research direction: Whether you can articulate a specific AI safety or alignment research area you're excited about and tie it to a chosen Anthropic team
Team fit: How clearly your background and interests map to one of the fellowship's workstreams and Anthropic's research culture
Program commitment: Your likelihood of accepting a full-time offer if extended and continuing in AI safety or security work after the fellowship
Logistical fit: Confirmation that you can work full-time, preferably in person from the Berkeley or London workspaces, with appropriate work authorization

Recently asked questions

Recent applicants have been asked:

Why are you interested in participating in the Fellows program?
Tell us briefly about one or more research areas you're excited about right now, and why.
How likely are you to accept a full-time offer at Anthropic if you receive one after the Fellows program?
How likely are you to be interested in continuing to work on AI safety or security after the Fellows program?
Please share context on each reference: their background, your relationship, what you worked on together, and how closely you collaborated.

CodeSignal online assessment

The Anthropic AI Safety Fellow CodeSignal online assessment is a 90-minute automated coding round built around a single system that extends across four progressively harder stages. Each stage unlocks the next, with points weighted evenly at 250 per stage for a maximum of 1,000. Speed and fluency matter as much as correctness.

The first stage typically starts at a level most candidates can clear quickly, and the difficulty ramps sharply by the third and fourth stages, where many candidates run out of time before implementing the full feature set. Your code is validated against a set of test cases inside CodeSignal, and you can iterate on failing cases before advancing.

Interviewers look for:

Implementation fluency: How quickly you can translate a clear spec into working code without getting stuck on structure or syntax
Test-case discipline: Whether you read failing test output carefully and patch edge cases instead of rewriting whole blocks
Incremental design: How well your early-stage code accommodates later-stage extensions without a full rewrite
Data structure judgment: Whether you pick lightweight, appropriate structures for each new requirement layered onto the system
Time management: How you allocate the fixed window across stages, including knowing when to move on from a partial solution

Recently asked questions

The Anthropic AI Safety Fellow online assessment draws from implementation-style prompts rather than standard algorithmic challenges. Recent candidates have been asked to:

Implement an in-memory database that starts with basic key-value lookup and layers on features like time-to-live (TTL) expiration across later stages.
Build a bank account system that maintains accounts in a fake bank, with later stages adding balance tracking, scheduled transactions, and interest calculation.
Design a file system with progressive complexity layered across stages.
Build a cloud database with progressive complexity layered across stages.

Architectural choices made early carry real weight into the harder stages.

Coding screen with an Anthropic engineer

The Anthropic AI Safety Fellow coding screen is a 90-minute live round conducted on CodeSignal with an Anthropic engineer or research scientist. The round centers on a logic-heavy implementation prompt with significant supporting context, and the format is one main implementation task plus a follow-up extension.

Interviewers may push into how you scope the prompt before touching code, since the written question is easy to misread on a first pass. If you jump straight into coding without clarifying constraints, expect to rewrite substantial portions midway through the round.

Interviewers look for:

Prompt comprehension: How carefully you parse a dense spec and separate core requirements from surrounding context
Clarifying questions: Whether you surface scope, constraints, and edge cases through targeted questions before writing code
Communication under pressure: How clearly you narrate your approach while working, especially when the second part of the round shifts the problem
Time discipline: Whether you can move quickly through the implementation and debug errors in real time without running out of time on a partial solution
Reasoning over correctness: How well you articulate logic when time runs short, even if the final implementation isn't fully working

Recently asked questions

In a recent candidate's experience, the live coding screen prompt was:

Implement a streaming database, with a diagram accompanying the written spec and a follow-up extension that builds on the initial implementation.

Prompting and engineering with LLMs round

The Anthropic AI Safety Fellow final loop opens with a 55-minute live coding session conducted in a Google Colab notebook with GPU access. The round runs in two parts: an implementation task that has you complete a missing piece of an LLM inference pipeline, followed by an open-ended discussion on how you reason through prompt engineering, hallucination handling, and prompt strategy in production systems.

The environment setup is part of the round's discipline. Anthropic may instruct you to test the Colab notebook and confirm GPU execution before the interview starts, since debugging environment issues inside the 55 minutes eats into your working time. Despite the round's subject matter, LLM assistance is not permitted, here or anywhere else in the loop.

Most candidates only finish the first part within the 55-minute window. Treat the implementation task as the priority and the prompt engineering discussion as a stretch goal you can earn time for by moving cleanly through Part 1.

Interviewers look for:

LLM implementation fluency: How comfortably you read and extend code tied to model inference and output processing
Debugging intuition: Whether you can trace through partially working code, identify what's missing, and match your implementation to the interviewer's stated expectations for each function
Prompt engineering reasoning: How clearly you reason through prompt design choices, including few-shot vs. zero-shot tradeoffs and how to handle hallucinations or unreliable model outputs in production
Environment readiness: How seriously you take the pre-interview setup instructions, including confirming GPU execution before the session
AI safety reasoning: Whether your design choices reflect genuine concern for AI safety rather than purely technical optimization

Recently asked questions

Recent candidates have been asked to:

Complete a missing implementation in 20 to 30 lines of skeleton code across two functions, processing the model's output correctly after inference.
Handle hallucinations or unreliable model outputs in a production setting, including how you'd detect them and what you'd do when they occur.
Design and evaluate a prompt that uses an LLM as a classifier, including how you'd test the output for accuracy.
Walk through few-shot vs. zero-shot prompting, including when you'd choose each and how the choice affects model behavior in practice.

Research brainstorm

The Anthropic AI Safety Fellow final loop closes with a 15-minute research brainstorm with a potential research supervisor or mentor. The session is fully open-ended, there's no right or wrong answer, and the interviewer typically won't signal whether you're heading in a useful direction while you think out loud.

Expect prompts built around Anthropic's active AI safety and alignment research, with space to propose your own research angle within the prompt. The 15-minute window is the constraint that defines the round. You have to move quickly from prompt to framing to at least one substantive research idea, and you need to be able to defend your reasoning without mid-round feedback to course-correct.

Interviewers look for:

Research fluency: How comfortably you frame an open-ended alignment or safety prompt into a tractable research question
Familiarity with Anthropic's work: Whether your ideas show real grounding in the alignment research Anthropic is actively publishing
Thesis articulation: How clearly you define your proposed direction and explain why it matters, since 15 minutes leaves little room for buildup
Technical depth: Whether your proposed approaches hold up to a researcher's scrutiny of LLM behavior, training, and ML evaluation
Intellectual independence: How well you commit to and defend a direction when the interviewer offers no signal either way

Recently asked questions

In a recent candidate's experience, the prompt focused on alignment:

How would you prevent bad actors from misaligning an LLM for harmful use cases?
How would you detect misalignment?
How would you train models to be more robustly aligned with intended objectives?

The discussion stayed within that open-ended alignment framing for the full 15 minutes.

Reference checks

The Anthropic AI Safety Fellow loop includes a structured reference check. Anthropic asks for three references in the application form and may contact them at any point during the loop without notifying you, with most outreach happening during the final round.

Anthropic prefers references from the ML research community when possible and looks for collaborators who can speak concretely to your strengths and weaknesses on technical work. Brief your references in advance on the fellowship, your application content, and the kinds of projects you've described, so they can speak fluently on those topics if Anthropic reaches out on short notice.

Interviewers look for:

Technical credibility: Whether your references can speak concretely to the work you've done and the impact of your contributions
Collaboration signal: How well-positioned your references are to comment on how you operate in research or engineering teams
Research community standing: Whether your references include collaborators from the ML research community or adjacent technical fields
Strength and weakness clarity: Whether your references can articulate both where you excel and where you have room to grow
Responsiveness: Whether your references reply promptly when Anthropic reaches out, since the loop moves quickly

How to prepare for the Anthropic AI Safety Fellow interview

Prioritize implementation speed: The CodeSignal rounds reward coding fluency, so practice writing clean, working implementations of small systems quickly rather than memorizing algorithmic patterns.
Build fluency with data-structure-heavy systems coding: Both coding rounds center on building small systems that extend across requirements, including in-memory stores and streaming data designs. Practice implementing these from scratch, layering in features like expiration, eviction, and streaming updates.
Clarify dense prompts before you start coding: The live engineer screen carries a prompt dense enough that 10-15 minutes of clarification is expected. Build the habit of reading the full spec, asking about scope and constraints, and confirming your understanding before writing a line of code.
Sharpen your prompt engineering reasoning: Part 2 of the prompting round leans on how you handle hallucinations, design prompts for production use, and reason through few-shot vs. zero-shot tradeoffs. Practice articulating your prompt design choices clearly, since the discussion rewards reasoning more than implementation depth.
Read Anthropic's active alignment and safety research: The 15-minute brainstorm rewards real familiarity with what Anthropic is publishing. Work through recent papers and posts on alignment, misuse prevention, and model evaluation so you can frame research ideas that land with a mentor on that team.
Practice with mock interviews: Simulate the format with a partner or coach who can run you through implementation-style prompts on a timer, push back on clarifying questions, and pressure-test how you narrate your approach when you hit a wall.

About the Anthropic AI Safety Fellow role

The Anthropic AI Safety Fellowship is a 4-month, paid research position that sits between an internship and a full-time role. Fellows work on a defined research project under a senior Anthropic research mentor, contribute at the capacity of a full-time researcher or engineer, and can convert to a permanent role if the work lands.

Anthropic AI Safety Fellows typically work on:

Mentored research projects: Defined research work paired with a senior Anthropic researcher or engineer, with the expectation of a completed project by the end of the 4 months
Full-time-equivalent output: Research expectations match Anthropic's permanent research engineers and research scientists, with full ownership of a defined project across the fellowship
AI safety and alignment work: Projects anchored to Anthropic's active safety research agenda, including alignment, misuse prevention, and model evaluation
Conversion pathway: A defined path to a permanent research engineer or research scientist role for fellows whose work meets Anthropic's bar during the program

Anthropic AI Safety Fellow experience requirements

Anthropic designed its AI Safety Fellow program for mid-career technical professionals transitioning into AI safety research. Strong candidates at any career stage are welcome to apply.

Past Fellows have come from physics, mathematics, computer science, and cybersecurity backgrounds.

Candidates in past cohorts typically had:

Strong Python skills: Demonstrated ability to make concrete progress on ambiguous technical problems
Technical research background: Experience in machine learning, software engineering, or AI safety research
AI safety motivation: A clear interest in reducing catastrophic risks from advanced AI systems and transitioning into empirical AI safety research

Additional resources

FAQs about the Anthropic AI Safety Fellow interview

How much does an Anthropic AI Safety Fellow make?

Anthropic publishes a base stipend of approximately $61,600 for the 4-month fellowship, plus up to $60,000 in compute funding to support fellows' research projects.

The fellowship is structured as a stipend rather than salary, so the comp doesn't include equity, bonuses, or standard full-time employee benefits. Fellows who convert to full-time roles after the program move onto Anthropic's standard compensation structure for research engineers or research scientists.

Can you use LLMs during the Anthropic AI Safety Fellow interview?

Anthropic's published candidate AI guidance allows LLM use during the application phase but bars it during live assessments and take-home work. Use Claude or another LLM to refine your written application responses after drafting them yourself, but expect to work without AI assistance during the CodeSignal online assessment, live coding screen, research brainstorm, and prompting round.

Where is the Anthropic AI Safety Fellowship located?

The Anthropic AI Safety Fellowship is remote-friendly, with designated workspaces in Berkeley, California and London, UK where mentors visit in person. Anthropic prefers Fellows work from one of those two locations when possible, but accommodates remote work for Fellows whose setup requires it. Work authorization is required in the US, UK, or Canada, and Anthropic doesn't currently sponsor visas.

How competitive is the Anthropic AI Safety Fellowship?

The Anthropic AI Safety Fellowship is highly selective. One recent finalist reported that roughly 150 candidates completed all interview rounds for an estimated 32 spots, though Anthropic doesn't publish official acceptance numbers. The application form, online assessment, and live coding rounds each function as serious gating stages.

Can the Anthropic AI Safety Fellowship lead to a full-time role?

The Anthropic AI Safety Fellowship can convert to a permanent position at Anthropic, but it's not guaranteed. Anthropic reports 25-50% of fellows across cohorts received a full-time offer. The first cohort exceeded 40%. Strong performers typically convert into research engineer or research scientist roles. Fellows who don't convert have gone on to safety research roles at other organizations.

Anthropic AI Safety Fellow interview process

Application screen

Recently asked questions

CodeSignal online assessment

Recently asked questions

Coding screen with an Anthropic engineer

Recently asked questions

Prompting and engineering with LLMs round

Recently asked questions

Research brainstorm

Recently asked questions

Reference checks

How to prepare for the Anthropic AI Safety Fellow interview

About the Anthropic AI Safety Fellow role

Anthropic AI Safety Fellow experience requirements

Additional resources

FAQs about the Anthropic AI Safety Fellow interview

How much does an Anthropic AI Safety Fellow make?

Can you use LLMs during the Anthropic AI Safety Fellow interview?

Where is the Anthropic AI Safety Fellowship located?

How competitive is the Anthropic AI Safety Fellowship?

Can the Anthropic AI Safety Fellowship lead to a full-time role?

Learn everything you need to ace your AI Safety Fellow interviews.

Follow Us