How to Create a Take-home for Defined Tasks
In this lesson, we’ll walk through a 5-step framework you can use to complete a strong take-home assignment for defined tasks. We’ll also teach you how to structure the write-up when submitting your assignment.

- Step 1: Define goals. Convert the given problem into a data science task.
- Step 2: Understand your data. Examine the type of data you’re working with, understand basic patterns in the data, and relate the datasets to the goal.
- Step 3: Problem-solve. List the possible approaches, define success, make tradeoffs, and prioritize 1-2 approaches.
- Step 4: Execute with code. Write the code to execute your plan, ensuring that the code is effective, correct, simple, efficient, and readable.
- Step 5: Summarize. Condense your work and put the “bottom line up front.”
Let's go through an example together:
Say you’re given a dataset containing visits to a location and you’ve been asked ‘How many visitors do you expect in 3 months?’ In this case, you’ll want to include a forecast or prediction in your report.
Plan A may be to conduct time-series forecasting, plan B may be an ML-based approach, and plan C may be to visualize the trends to see if there’s something that stands out.
Step 1: Define your goals
The first and most important thing to do is to convert the given problem into a data science task.
Start by brainstorming and planning the analysis and evidence needed to answer the task. In addition to having a plan A, try to think of a few different approaches here.
When a task is defined for you, ensure your data science and analysis address the problem. The analysis and work should give you clear evidence, results, or ideas, given the problem.
Step 2: Understand the data
There are a few steps and guiding questions you can use to understand any dataset you’re given. We’ve categorized these steps into three buckets: basic, open-ended, and problem-focused.
Basic: Understand the data you’re working with
- What are you given?
- How many datasets are you given?
- How large are the files or sources?
- How do they relate to each other?
- What do they contain?
- What is the “unit” of data? (What does each row represent? e.g. a record, a transaction, a person)
- What are the unique identifiers, or “keys,” of the data?
- What does each column mean? If a label is unfamiliar, try Googling the name to see if it’s a domain/industry standard acronym or something deliberately opaque.
- How clean is the data?
- Are there any duplicates? What’s missing in the data?
- Are dates, names, or numbers properly formatted?
Open-ended: Understand patterns in the data
- What trends are notable from the data? Look at aggregate metrics, such as counts over time, counts by category, etc., depending on what the dataset looks like.
- What trends or basic explorations can you visualize? (optional)
- What doesn’t make sense in the data?
- What assumptions might you need to make?
Problem-focused: Relate the datasets to the goal
- How does the data relate to the problem?
- Is the original analysis you thought of feasible? What other approaches could make sense?
- If it’s an ML problem, how will you partition the data into training, validation, and test sets? Do you need a validation set?
- Is there other data that could help you better answer your question? Is this publicly available? Can you use it to build a better solution for an “advanced” section? (optional, save for the end)
Synthesis
Once you’ve analyzed your data to understand it well, write 3-5 lines describing what you took away from the data. In particular, specify:
- Any assumptions made
- The atomic unit in the data
- If/how you merged or linked different datasets together
Preserve the exploration code and results in a separate notebook, which you can add to an “Appendix” in your submission.
Step 3: Problem-solve
Now, list potential approaches to solve this problem. Then define success, make tradeoffs, and prioritize 1-2 optimal approaches.
List the possible approaches
Depending on how “simple” the given problem is, this could be straightforward or fairly complex. Keep the options as simple as possible, and be sure to write that you actively made this choice in your analysis section.
Taking the simpler approach not only minimizes the room for errors in your implementation and prevents the risk of implementation taking too long, but it also makes it easier to interpret and build upon. This is often why, even once on the job, you might find yourself taking “less fancy” approaches.
For example, if it’s an ML problem, consider:
- What you'd want to predict
- What algorithms would be appropriate
- Whether you have enough data or features for a complex model
- Whether transparency matters, in which case a simple model works better
Define success
- Decide what matters most for the ultimate goal. This could be accuracy, precision, recall, a simple recommendation, etc.
- Make this success metric explicit in your work, and write 1-2 sentences on why this is the right metric to consider.
- If possible, define a baseline. For example, if you’re working on a forecasting problem, a simple average could be the baseline.
Make trade-offs and prioritize 1-2 approaches
- Since these choices are path-dependent, you might quickly be faced with quite a few options. Start by writing them all down in a list or tree.
- For each “fork in the road,” determine what choice would lead to a better outcome. For example, with tabular data, gradient-boosted trees tend to perform well when optimizing for model accuracy performance.
- Then, consider the trade-off(s). For example, a logistic regression might be less accurate but more transparent.
Write 4-5 lines on what decisions you’re making in your methodology, what approach you’ll pursue as plan A, and why.
Then, identify an alternative approach. This alternative could be something more fancy, or even something simpler to serve as a baseline. For each choice you make, write 1 line about the decision and 1 line about why.
If you have the time, consider doing a “baseline” (a simple approach to evaluate your solution’s effectiveness) and a “bonus” (a superior but more complicated approach).
Step 4: Execute with code
Write the code to pursue your plan A, plan B, and (optional) “bonus” approach.
When writing your code, ensure that it meets the following standards:
- Effectiveness: Does it solve the problem given? Is it clear what the answer is and why?
- Correctness: Does your code do what it’s supposed to do?
- Simplicity: Is your code simple to understand and write?
- Efficiency: Are there any areas where your code is inefficient? For example, are you doing vectorizable operations in loops?
- Readability: Is your code modular, well-formatted (especially if you’re submitting a jupyter notebook), easy to read, and well-commented?
While this is the bulk of the work, it can take the least time of all the sections here since this is most “defined” segment of the take-home.
Step 5: Summarize
Condense all your work into concise insights and solutions. A well-written summary is critical, because it ensures that all your hard work is made clear to a reviewer.
To create a strong summary, BLUF: put the "bottom-line up-front." This technique helps you communicate the main insight or the “so-what” right at the beginning.
- To BLUF, tie your answer to your goal and summarize it in a sentence.
- Then add a sentence about what this enables or why it matters.
- Then explain the “how” by describing what steps you took to arrive at this conclusion.
"In this work, I’m able to forecast that Uber can expect around 100,000 ride requests (95% confidence interval [97,924, 103,144] in the state of New York in September 2024.
- This would be roughly 5% higher than rides in previous months and 12% higher than rides in September 2023, meaning we should work closely with the supply-generation team to ensure there are enough drivers on the roads.
- I arrived at this forecast using a gradient-boosted regression tree (model root mean squared error (RMSE): 2,400) and the provided datasets, where ~20% of data was left out for model evaluation. Using a basic moving average forecasting approach, we achieved an RMSE of 3,200, over which the final model provides a 25% improvement in accuracy."
How to structure the write-up
The take-home is most commonly submitted as a write-up. Verbal presentations are rare, but presentations and write-ups can follow the same outline that’s given below.
A great outline includes these major components:
- Executive summary
- BLUF: your conclusion or core finding
- “So-what:” the action unlocked and suggested next steps
- How: the method you used
- Assumptions
- What are you assuming or taking for granted?
- Open questions: What is inconclusive or unknown?
- Data analysis
- Problem-solving approach: north star/definition of success, main approach and rationale, and definition of baseline (optional)
- Plan A (i.e. main method used) description: plot/chart that demonstrates a clear answer (e.g. point estimate with confidence intervals if projecting a trend or ROC in an ML classification problem), code (can also go in the Appendix)
- Alternative approach description
- Main findings and results deep-dive
- Specific results you may want to explore further (mostly optional)
- Caveats
- For example, reasons why trends might not apply or be less accurate than expected
- Suggested next steps
- Business decisions to take based on these results, and/or
- Subsequent analysis to do and why
- Appendix
- Data exploration
- Data cleaning: any big assumptions made with the data or in cleaning the data should be included in the Assumptions section