Data Scientist Interview Questions

Review this list of 165 data scientist interview questions and answers verified by hiring managers and candidates.
  • "I would use A/B testing to see if the new feature would be incrementally beneficial. To begin the testing, we should define what's the goal of this testing. Let's say the new feature would increase the average number of trade by X. Then randomly assign the clients to two groups, control and test group. Control group doesn't see the new feature and the test group see the new feature. We could also stratified sampling if we want to make sure cover different customer segmentation. During this desig"

    Jiin S. - "I would use A/B testing to see if the new feature would be incrementally beneficial. To begin the testing, we should define what's the goal of this testing. Let's say the new feature would increase the average number of trade by X. Then randomly assign the clients to two groups, control and test group. Control group doesn't see the new feature and the test group see the new feature. We could also stratified sampling if we want to make sure cover different customer segmentation. During this desig"See full answer

    Data Scientist
    Statistics & Experimentation
  • +1

    "Over-fitting of a model occurs when model fails to generalize to any new data and has high variance withing training data whereas in under fitting model isn't able to uncover the underlying pattern in the training data and high bias. Tree based model like decision tree and random forest are likely to overfit whereas linear models like linear regression and logistic regression tends to under fit. There are many reasons why a Random forest can overfits easily 1. Model has grown to its full depth a"

    Jyoti V. - "Over-fitting of a model occurs when model fails to generalize to any new data and has high variance withing training data whereas in under fitting model isn't able to uncover the underlying pattern in the training data and high bias. Tree based model like decision tree and random forest are likely to overfit whereas linear models like linear regression and logistic regression tends to under fit. There are many reasons why a Random forest can overfits easily 1. Model has grown to its full depth a"See full answer

    Data Scientist
    Concept
    +2 more
  • "I responded with a project that I was a part of during my capstone class. I described how I used HTML, Python, and PostGRESQL in conjunction to create a functioning website using SCRUM."

    Kanishkan V. - "I responded with a project that I was a part of during my capstone class. I described how I used HTML, Python, and PostGRESQL in conjunction to create a functioning website using SCRUM."See full answer

    Data Scientist
    Behavioral
    +1 more
  • Data Scientist
    Analytical
    +2 more
  • Data Scientist
    Coding
    +3 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • "The distribution of daily minutes spent on Facebook per user is heavily right-skewed with a long tail. Most users spend a short amount of time while a smaller segment of heavy users push up the average with 2–3+ hours daily."

    Vineet M. - "The distribution of daily minutes spent on Facebook per user is heavily right-skewed with a long tail. Most users spend a short amount of time while a smaller segment of heavy users push up the average with 2–3+ hours daily."See full answer

    Data Scientist
    Statistics & Experimentation
  • "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"

    Meenakshi D. - "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"See full answer

    Data Scientist
    Concept
    +4 more
  • Amazon logoAsked at Amazon 

    "SQL databases are relational, NoSQL databases are non-relational. SQL databases use structured query language and have a predefined schema. NoSQL databases have dynamic schemas for unstructured data. SQL databases are vertically scalable, while NoSQL databases are horizontally scalable."

    Ali H. - "SQL databases are relational, NoSQL databases are non-relational. SQL databases use structured query language and have a predefined schema. NoSQL databases have dynamic schemas for unstructured data. SQL databases are vertically scalable, while NoSQL databases are horizontally scalable."See full answer

    Data Scientist
    Concept
    +7 more
  • Adobe logoAsked at Adobe 
    Video answer for 'Merge k sorted linked lists.'
    +7

    "A much better solution than the one in the article, below: It looks like the ones writing articles here in Javascript do not understand the time/space complexity of javascript methods. shift, splice, sort, etc... In the solution article you have a shift and a sort being done inside a while, that is, the multiplication of Ns. My solution, below, iterates through the list once and then sorts it, separately. It´s O(N+Log(N)) class ListNode { constructor(val = 0, next = null) { th"

    Guilherme F. - "A much better solution than the one in the article, below: It looks like the ones writing articles here in Javascript do not understand the time/space complexity of javascript methods. shift, splice, sort, etc... In the solution article you have a shift and a sort being done inside a while, that is, the multiplication of Ns. My solution, below, iterates through the list once and then sorts it, separately. It´s O(N+Log(N)) class ListNode { constructor(val = 0, next = null) { th"See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • Data Scientist
    Behavioral
  • Adobe logoAsked at Adobe 
    +9

    "Problem Statement: The Fibonacci sequence is defined as F(n) = F(n-1) + F(n-2) with F(0) = 1 and F(1) = 1. The solution is given in the problem statement itself. If the value of n = 0, return 1. If the value of n = 1, return 1. Otherwise, return the sum of data at (n - 1) and (n - 2). Explanation: The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, typically starting with 0 and 1. Java Solution: public static int fib(int n"

    Rishi G. - "Problem Statement: The Fibonacci sequence is defined as F(n) = F(n-1) + F(n-2) with F(0) = 1 and F(1) = 1. The solution is given in the problem statement itself. If the value of n = 0, return 1. If the value of n = 1, return 1. Otherwise, return the sum of data at (n - 1) and (n - 2). Explanation: The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, typically starting with 0 and 1. Java Solution: public static int fib(int n"See full answer

    Data Scientist
    Data Structures & Algorithms
    +2 more
  • Adobe logoAsked at Adobe 
    Data Scientist
    Data Structures & Algorithms
    +4 more
  • OpenAI logoAsked at OpenAI 

    "Reinforcement Learning is a type of machine learning where an agent learns to make decisions by trying out different actions and receiving rewards or penalties in return. The goal is to learn, over time, which actions yield the highest rewards. There are three core components in RL: The agent — the learner or decision-maker (e.g., an algorithm or robot), The environment — everything the agent interacts with, Actions and rewards — the agent takes actions, and the environmen"

    Constantin P. - "Reinforcement Learning is a type of machine learning where an agent learns to make decisions by trying out different actions and receiving rewards or penalties in return. The goal is to learn, over time, which actions yield the highest rewards. There are three core components in RL: The agent — the learner or decision-maker (e.g., an algorithm or robot), The environment — everything the agent interacts with, Actions and rewards — the agent takes actions, and the environmen"See full answer

    Data Scientist
    Concept
    +1 more
  • Discord logoAsked at Discord 
    Data Scientist
    Behavioral
    +1 more
  • +4

    "-- Write your query here select u.userid as userid, IFNULL(sum(purchase_value), 0) AS LTV FROM user_sessions u JOIN attribution a ON u.sessionid = a.sessionid group by user_id order by LTV desc ; Needs a full join. Wondering why cant we do a left outer join here. All the sessions should have complete data."

    Aneesha K. - "-- Write your query here select u.userid as userid, IFNULL(sum(purchase_value), 0) AS LTV FROM user_sessions u JOIN attribution a ON u.sessionid = a.sessionid group by user_id order by LTV desc ; Needs a full join. Wondering why cant we do a left outer join here. All the sessions should have complete data."See full answer

    Data Scientist
    Coding
    +3 more
  • Adobe logoAsked at Adobe 
    Data Scientist
    Data Structures & Algorithms
    +4 more
  • Google logoAsked at Google 
    +1

    "Deep Learning is a part of Artificial Intelligence, it's like teaching the machine to think and make decisions on its own. It's like how we teach a child the concept of an apple - it's round, red, has a stem on top. We show them multiple pictures of apples and then they understand and can recognize an apple in future. Similarly, we feed lots of data to the machine, and slowly, it starts learning from that data, and can then make relevant predictions or decisions based on what it has learnt. A co"

    Surbhi G. - "Deep Learning is a part of Artificial Intelligence, it's like teaching the machine to think and make decisions on its own. It's like how we teach a child the concept of an apple - it's round, red, has a stem on top. We show them multiple pictures of apples and then they understand and can recognize an apple in future. Similarly, we feed lots of data to the machine, and slowly, it starts learning from that data, and can then make relevant predictions or decisions based on what it has learnt. A co"See full answer

    Data Scientist
    Concept
    +3 more
  • "Clarfying questions : When we say a decrease in users adding the bank accounts. I would like to understand how the users making payments within Venmo I assume they are either using their credit cards/debit cards? I would like to understand why the Adding of Bank Accounts is integral to Venmo since the users are using the debit card and Credit Cards. My understanding is when the payments happen through debit cards rails Venmo pays higher interchange fees and to Reduces any losses incurred"

    Dev S. - "Clarfying questions : When we say a decrease in users adding the bank accounts. I would like to understand how the users making payments within Venmo I assume they are either using their credit cards/debit cards? I would like to understand why the Adding of Bank Accounts is integral to Venmo since the users are using the debit card and Credit Cards. My understanding is when the payments happen through debit cards rails Venmo pays higher interchange fees and to Reduces any losses incurred"See full answer

    Data Scientist
    Analytical
    +1 more
  • Amazon logoAsked at Amazon 
    Video answer for 'What are common linear regression problems?'

    "I can try to summarize their discussion as I remembered. Linear regression is one of the method to predict target (Y) using features (X). Formula for linear regression is a linear function of features. The aim is to choose coefficients (Teta) of the prediction function in such a way that the difference between target and prediction is least in average. This difference between target and prediction is called loss function. The form of this loss function could be dependent from the particular real"

    Ilnur I. - "I can try to summarize their discussion as I remembered. Linear regression is one of the method to predict target (Y) using features (X). Formula for linear regression is a linear function of features. The aim is to choose coefficients (Teta) of the prediction function in such a way that the difference between target and prediction is least in average. This difference between target and prediction is called loss function. The form of this loss function could be dependent from the particular real"See full answer

    Data Scientist
    Analytical
    +2 more
  • Meta (Facebook) logoAsked at Meta (Facebook) 

    "Product Understanding - Push notifications are pop up notifications received on the device (phone, tablet etc.) sent by various Meta apps whenever a new post has been made or a new message is received Clarifying Questions - Is is specific to one device? Is it specific to one product? Is it specific to one region? Is it specific to one OS? Is this as a result of changes to algorithm/UI? Existing or a new feature? Assumptions - KPI calculation will only be for users who h"

    Vishal S. - "Product Understanding - Push notifications are pop up notifications received on the device (phone, tablet etc.) sent by various Meta apps whenever a new post has been made or a new message is received Clarifying Questions - Is is specific to one device? Is it specific to one product? Is it specific to one region? Is it specific to one OS? Is this as a result of changes to algorithm/UI? Existing or a new feature? Assumptions - KPI calculation will only be for users who h"See full answer

    Data Scientist
    Analytical
    +2 more
Showing 81-100 of 165