Data Scientist Interview Questions

Review this list of 278 data scientist interview questions and answers verified by hiring managers and candidates.
  • Meta (Facebook) logoAsked at Meta (Facebook) 

    "How would you increase the number of comments on groups?"

    rkk293 - "How would you increase the number of comments on groups?"See full answer

    Data Scientist
    Product Design
  • Adobe logoAsked at Adobe 
    Video answer for 'Merge k sorted linked lists.'
    +6

    "A much better solution than the one in the article, below: It looks like the ones writing articles here in Javascript do not understand the time/space complexity of javascript methods. shift, splice, sort, etc... In the solution article you have a shift and a sort being done inside a while, that is, the multiplication of Ns. My solution, below, iterates through the list once and then sorts it, separately. It´s O(N+Log(N)) class ListNode { constructor(val = 0, next = null) { th"

    Guilherme F. - "A much better solution than the one in the article, below: It looks like the ones writing articles here in Javascript do not understand the time/space complexity of javascript methods. shift, splice, sort, etc... In the solution article you have a shift and a sort being done inside a while, that is, the multiplication of Ns. My solution, below, iterates through the list once and then sorts it, separately. It´s O(N+Log(N)) class ListNode { constructor(val = 0, next = null) { th"See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • +1

    "SELECT AVG(julianday(dateend) - julianday(datestart)) AS avgcampaignduration FROM campaign; `"

    Salome L. - "SELECT AVG(julianday(dateend) - julianday(datestart)) AS avgcampaignduration FROM campaign; `"See full answer

    Data Scientist
    Coding
    +1 more
  • McKinsey logoAsked at McKinsey 

    "The cases where data is under heavy outlier influence. Since mean fluctuates due to the presence of an outlier, median might be a better measure"

    Himani E. - "The cases where data is under heavy outlier influence. Since mean fluctuates due to the presence of an outlier, median might be a better measure"See full answer

    Data Scientist
    Statistics & Experimentation
  • Apple logoAsked at Apple 
    Data Scientist
    Data Structures & Algorithms
    +4 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Meta (Facebook) logoAsked at Meta (Facebook) 

    "Product Understanding - Push notifications are pop up notifications received on the device (phone, tablet etc.) sent by various Meta apps whenever a new post has been made or a new message is received Clarifying Questions - Is is specific to one device? Is it specific to one product? Is it specific to one region? Is it specific to one OS? Is this as a result of changes to algorithm/UI? Existing or a new feature? Assumptions - KPI calculation will only be for users who h"

    Vishal S. - "Product Understanding - Push notifications are pop up notifications received on the device (phone, tablet etc.) sent by various Meta apps whenever a new post has been made or a new message is received Clarifying Questions - Is is specific to one device? Is it specific to one product? Is it specific to one region? Is it specific to one OS? Is this as a result of changes to algorithm/UI? Existing or a new feature? Assumptions - KPI calculation will only be for users who h"See full answer

    Data Scientist
    Analytical
    +1 more
  • Google logoAsked at Google 
    +1

    "Deep Learning is a part of Artificial Intelligence, it's like teaching the machine to think and make decisions on its own. It's like how we teach a child the concept of an apple - it's round, red, has a stem on top. We show them multiple pictures of apples and then they understand and can recognize an apple in future. Similarly, we feed lots of data to the machine, and slowly, it starts learning from that data, and can then make relevant predictions or decisions based on what it has learnt. A co"

    Surbhi G. - "Deep Learning is a part of Artificial Intelligence, it's like teaching the machine to think and make decisions on its own. It's like how we teach a child the concept of an apple - it's round, red, has a stem on top. We show them multiple pictures of apples and then they understand and can recognize an apple in future. Similarly, we feed lots of data to the machine, and slowly, it starts learning from that data, and can then make relevant predictions or decisions based on what it has learnt. A co"See full answer

    Data Scientist
    Concept
    +3 more
  • +3

    "SELECT u.id as user_id, u.name, COUNT(t.product_id) AS orders FROM users u JOIN transactions t ON t.user_id = u.id JOIN products p ON p.id = t.product_id GROUP BY u.id, u.name ORDER BY orders DESC LIMIT 1 `"

    Derrick M. - "SELECT u.id as user_id, u.name, COUNT(t.product_id) AS orders FROM users u JOIN transactions t ON t.user_id = u.id JOIN products p ON p.id = t.product_id GROUP BY u.id, u.name ORDER BY orders DESC LIMIT 1 `"See full answer

    Data Scientist
    Coding
    +1 more
  • Data Scientist
    Statistics & Experimentation
  • +5

    "with t1 as (select employee_name, department_id, salary, avg(salary) over (partition by departmentid) as avgsalary, abs(salary - avg(salary) over (partition by department_id)) as diff from employees ) select employee_name, department_id, salary, avg_salary, denserank() over (partition by departmentid order by diff desc) as deviation_rank from t1 order by departmentid asc, deviationrank asc, employee_name `"

    Alexey T. - "with t1 as (select employee_name, department_id, salary, avg(salary) over (partition by departmentid) as avgsalary, abs(salary - avg(salary) over (partition by department_id)) as diff from employees ) select employee_name, department_id, salary, avg_salary, denserank() over (partition by departmentid order by diff desc) as deviation_rank from t1 order by departmentid asc, deviationrank asc, employee_name `"See full answer

    Data Scientist
    Coding
    +1 more
  • +3

    "Hi, my solution gives the exact numerical values as the proposed solution, but it doesn't pass the tests. Am I missing something, or is this a bug? def findrevenueby_city(transactions: pd.DataFrame, users: pd.DataFrame, exchange_rate: pd.DataFrame) -> pd.DataFrame: gets user city for each user id userids = users[['id', 'usercity']] and merge on transactions transactions = transactions.merge(user_ids, how='left"

    Gabriel P. - "Hi, my solution gives the exact numerical values as the proposed solution, but it doesn't pass the tests. Am I missing something, or is this a bug? def findrevenueby_city(transactions: pd.DataFrame, users: pd.DataFrame, exchange_rate: pd.DataFrame) -> pd.DataFrame: gets user city for each user id userids = users[['id', 'usercity']] and merge on transactions transactions = transactions.merge(user_ids, how='left"See full answer

    Data Scientist
    Coding
    +2 more
  • Figma logoAsked at Figma 
    Data Scientist
    Behavioral
    +2 more
  • Data Scientist
    Statistics & Experimentation
  • +1

    "Schema is wrong - id from product is mapped to id from transactions, id from product should point to product_id in transcations table"

    Arshad P. - "Schema is wrong - id from product is mapped to id from transactions, id from product should point to product_id in transcations table"See full answer

    Data Scientist
    Coding
    +2 more
  • "with cte as ( select user_id, timestamp as current_login, lag(timestamp,1) over(partition by userid order by timestamp) as previouslogin , round(abs(julianday(timestamp)-julianday(lag(timestamp,1) over(partition by userid order by timestamp)))2460)as minuteselapsed from useractivitylog where activity_type ='LOGIN' ) select userid, currentlogin, previouslogin, minuteselapsed from cte where currentlogin previouslogin `"

    Gowtami K. - "with cte as ( select user_id, timestamp as current_login, lag(timestamp,1) over(partition by userid order by timestamp) as previouslogin , round(abs(julianday(timestamp)-julianday(lag(timestamp,1) over(partition by userid order by timestamp)))2460)as minuteselapsed from useractivitylog where activity_type ='LOGIN' ) select userid, currentlogin, previouslogin, minuteselapsed from cte where currentlogin previouslogin `"See full answer

    Data Scientist
    Coding
    +1 more
  • +2

    "SELECT a.marketing_channel, AVG(a.purchasevalue) AS avgpurchase_value, SUM(CASE WHEN a.purchasevalue > 0 THEN 1 ELSE 0 END) * 100 / COUNT(a.sessionid) AS conversion_rate FROM attribution a LEFT JOIN user_sessions u ON a.sessionid = u.sessionid GROUP BY a.marketing_channel ORDER BY conversion_rate DESC; "

    Soma R. - "SELECT a.marketing_channel, AVG(a.purchasevalue) AS avgpurchase_value, SUM(CASE WHEN a.purchasevalue > 0 THEN 1 ELSE 0 END) * 100 / COUNT(a.sessionid) AS conversion_rate FROM attribution a LEFT JOIN user_sessions u ON a.sessionid = u.sessionid GROUP BY a.marketing_channel ORDER BY conversion_rate DESC; "See full answer

    Data Scientist
    Coding
    +1 more
  • "Test case is wrong. It expects to sort in asc order of month_year. -- Write your query here SELECT strftime('%Y-%m', createdat) AS monthyear, COUNT(DISTINCT userid) AS numcustomers, COUNT(t.id) AS num_orders, SUM(price * quantity) AS order_amt FROM transactions t INNER JOIN products p ON t.product_id = p.id GROUP BY month_year ORDER BY month_year ; "

    Aneesha K. - "Test case is wrong. It expects to sort in asc order of month_year. -- Write your query here SELECT strftime('%Y-%m', createdat) AS monthyear, COUNT(DISTINCT userid) AS numcustomers, COUNT(t.id) AS num_orders, SUM(price * quantity) AS order_amt FROM transactions t INNER JOIN products p ON t.product_id = p.id GROUP BY month_year ORDER BY month_year ; "See full answer

    Data Scientist
    Coding
    +1 more
  • +1

    "SELECT i.item_category, o.order_date, SUM(o.orderquantity) AS totalunits_ordered FROM orders o JOIN items i ON o.itemid = i.itemid WHERE o.order_date >= DATE('now', '-6 days') GROUP BY i.item_category, o.order_date ORDER BY i.item_category ASC, o.order_date ASC;"

    Anonymous Tortoise - "SELECT i.item_category, o.order_date, SUM(o.orderquantity) AS totalunits_ordered FROM orders o JOIN items i ON o.itemid = i.itemid WHERE o.order_date >= DATE('now', '-6 days') GROUP BY i.item_category, o.order_date ORDER BY i.item_category ASC, o.order_date ASC;"See full answer

    Data Scientist
    Coding
    +1 more
  • "Clarification question: How many subscription plans are offered by Tinder ? If there is more than one subscription plan, then we need to ask is the fluctuation happening across all plans or in a particular one ? Assumption: Let's say lower priced subscription plan is showing the most fluctuation and there are only two types of plans In this subscription plan which age group is showing the most fluctuation (18-24,25-30, 30+ etc) ? Is there any seasonality trend observed (eg: placemen"

    Srijita P. - "Clarification question: How many subscription plans are offered by Tinder ? If there is more than one subscription plan, then we need to ask is the fluctuation happening across all plans or in a particular one ? Assumption: Let's say lower priced subscription plan is showing the most fluctuation and there are only two types of plans In this subscription plan which age group is showing the most fluctuation (18-24,25-30, 30+ etc) ? Is there any seasonality trend observed (eg: placemen"See full answer

    Data Scientist
    Technical
  • "Marketing campaigns are run through different channels such as social media, emails, SEO, web advertising, events, etc. Let’s look at some of the overall success metrics at a broader level: Total views for your campaign Unique views for your campaign Returning visitors for your campaign Engagement for your campaign (If it’s a social media campaign, the marketer might be interested in knowing the number of users engaging with the campaign and the type of campaign positive/negative) 5"

    Sangeeta P. - "Marketing campaigns are run through different channels such as social media, emails, SEO, web advertising, events, etc. Let’s look at some of the overall success metrics at a broader level: Total views for your campaign Unique views for your campaign Returning visitors for your campaign Engagement for your campaign (If it’s a social media campaign, the marketer might be interested in knowing the number of users engaging with the campaign and the type of campaign positive/negative) 5"See full answer

    Data Scientist
    Statistics & Experimentation
Showing 161-180 of 278