Data Scientist Interview Questions

Review this list of 155 data scientist interview questions and answers verified by hiring managers and candidates.
  • Amazon logoAsked at Amazon 

    "SQL databases are relational, NoSQL databases are non-relational. SQL databases use structured query language and have a predefined schema. NoSQL databases have dynamic schemas for unstructured data. SQL databases are vertically scalable, while NoSQL databases are horizontally scalable."

    Ali H. - "SQL databases are relational, NoSQL databases are non-relational. SQL databases use structured query language and have a predefined schema. NoSQL databases have dynamic schemas for unstructured data. SQL databases are vertically scalable, while NoSQL databases are horizontally scalable."See full answer

    Data Scientist
    Concept
    +7 more
  • Discord logoAsked at Discord 
    Data Scientist
    Behavioral
    +4 more
  • Adobe logoAsked at Adobe 
    +5

    "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"

    Alvaro R. - "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"See full answer

    Data Scientist
    Coding
    +4 more
  • Adobe logoAsked at Adobe 
    +19

    " O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"

    Rick E. - " O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Adobe logoAsked at Adobe 
    +2

    "function main(){ const v1=[2,3, 4, 10] const v2= [3,4 ,5,20, 23] return merge(v1,v2); } function merge(left, right){ const result=[]; while(left.length>0&& right.length>0){ if(left[0]0){ result=result.concat(left) } if(right.length>0){ result=result.concat(right) } return result; }"

    Samuel M. - "function main(){ const v1=[2,3, 4, 10] const v2= [3,4 ,5,20, 23] return merge(v1,v2); } function merge(left, right){ const result=[]; while(left.length>0&& right.length>0){ if(left[0]0){ result=result.concat(left) } if(right.length>0){ result=result.concat(right) } return result; }"See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • Walmart Labs logoAsked at Walmart Labs 
    Data Scientist
    Behavioral
    +5 more
  • "too much discussing on p-value…. and theoritical things…. country are independant…."

    Brook - "too much discussing on p-value…. and theoritical things…. country are independant…."See full answer

    Data Scientist
    Analytical
  • "Clarification question: How many subscription plans are offered by Tinder ? If there is more than one subscription plan, then we need to ask is the fluctuation happening across all plans or in a particular one ? Assumption: Let's say lower priced subscription plan is showing the most fluctuation and there are only two types of plans In this subscription plan which age group is showing the most fluctuation (18-24,25-30, 30+ etc) ? Is there any seasonality trend observed (eg: placemen"

    Srijita P. - "Clarification question: How many subscription plans are offered by Tinder ? If there is more than one subscription plan, then we need to ask is the fluctuation happening across all plans or in a particular one ? Assumption: Let's say lower priced subscription plan is showing the most fluctuation and there are only two types of plans In this subscription plan which age group is showing the most fluctuation (18-24,25-30, 30+ etc) ? Is there any seasonality trend observed (eg: placemen"See full answer

    Data Scientist
    Technical
  • Google logoAsked at Google 

    "Clarification questions What is the purpose of connecting the DB? Do we expect high-volumes of traffic to hit the DB Do we have scalability or reliability concerns? Format Code -> DB Code -> Cache -> DB API -> Cache -> DB - APIs are built for a purpose and have a specified protocol (GET, POST, DELETE) to speak to the DB. APIs can also use a contract to retrieve information from a DB much faster than code. Load balanced APIs -> Cache -> DB **Aut"

    Aaron W. - "Clarification questions What is the purpose of connecting the DB? Do we expect high-volumes of traffic to hit the DB Do we have scalability or reliability concerns? Format Code -> DB Code -> Cache -> DB API -> Cache -> DB - APIs are built for a purpose and have a specified protocol (GET, POST, DELETE) to speak to the DB. APIs can also use a contract to retrieve information from a DB much faster than code. Load balanced APIs -> Cache -> DB **Aut"See full answer

    Data Scientist
    Concept
    +5 more
  • Amazon logoAsked at Amazon 

    "Law is my passion. Traveling all over the world in 5 years"

    Moshe S. - "Law is my passion. Traveling all over the world in 5 years"See full answer

    Data Scientist
    Behavioral
    +4 more
  • "1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"

    Saurabh N. - "1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"See full answer

    Data Scientist
    Behavioral
    +4 more
  • McKinsey logoAsked at McKinsey 

    "The cases where data is under heavy outlier influence. Since mean fluctuates due to the presence of an outlier, median might be a better measure"

    Himani E. - "The cases where data is under heavy outlier influence. Since mean fluctuates due to the presence of an outlier, median might be a better measure"See full answer

    Data Scientist
    Statistics & Experimentation
  • "A Random Forest works by building an ensemble of decision trees, each trained on a slightly different version of the data. The key mechanism is bagging: for each tree, we sample the training data with replacement (bootstrapping), so every tree sees a different subset of examples. On top of that, at each split the algorithm randomly selects a subset of features, so trees explore different predictors. These two sources of randomness decorrelate the trees. When we aggregate them — by averag"

    Yuexiang Y. - "A Random Forest works by building an ensemble of decision trees, each trained on a slightly different version of the data. The key mechanism is bagging: for each tree, we sample the training data with replacement (bootstrapping), so every tree sees a different subset of examples. On top of that, at each split the algorithm randomly selects a subset of features, so trees explore different predictors. These two sources of randomness decorrelate the trees. When we aggregate them — by averag"See full answer

    Data Scientist
    Technical
  • "To handle the non-uniform sampling, I'd first clean and divide the dataset into chunks of n second interval 'uniform' trajectory data(e.g. 5s or 10s trajectories). This gives us a cleaner trajectory data chunks, T, of format (ship_ID, x, y, z, timestamp) to be formed. For the system itself, I'd use a generative model, e.g. Variational AutoEncoder (VAE), and train the model's 'encoder' to produce a latent-space representation of input features (x,y,z,timestamp) from T, and it's 'decoder' to pred"

    Anonymous Hornet - "To handle the non-uniform sampling, I'd first clean and divide the dataset into chunks of n second interval 'uniform' trajectory data(e.g. 5s or 10s trajectories). This gives us a cleaner trajectory data chunks, T, of format (ship_ID, x, y, z, timestamp) to be formed. For the system itself, I'd use a generative model, e.g. Variational AutoEncoder (VAE), and train the model's 'encoder' to produce a latent-space representation of input features (x,y,z,timestamp) from T, and it's 'decoder' to pred"See full answer

    Data Scientist
    System Design
  • +1

    "WITH suspicious_transactions AS ( SELECT c.first_name, c.last_name, t.receipt_number, COUNT(t.receiptnumber) OVER (PARTITION BY c.customerid) AS noofoffences FROM customers c JOIN transactions t ON c.customerid = t.customerid WHERE t.receipt_number LIKE '%999%' OR t.receipt_number LIKE '%1234%' OR t.receipt_number LIKE '%XYZ%' ) SELECT first_name, last_name, receipt_number, noofoffences FROM suspicious_transactions WHERE noofoffences >= 2;"

    Jayveer S. - "WITH suspicious_transactions AS ( SELECT c.first_name, c.last_name, t.receipt_number, COUNT(t.receiptnumber) OVER (PARTITION BY c.customerid) AS noofoffences FROM customers c JOIN transactions t ON c.customerid = t.customerid WHERE t.receipt_number LIKE '%999%' OR t.receipt_number LIKE '%1234%' OR t.receipt_number LIKE '%XYZ%' ) SELECT first_name, last_name, receipt_number, noofoffences FROM suspicious_transactions WHERE noofoffences >= 2;"See full answer

    Data Scientist
    Coding
    +3 more
  • +1

    "SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"

    Alina G. - "SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"See full answer

    Data Scientist
    Coding
    +3 more
  • Figma logoAsked at Figma 
    Data Scientist
    Behavioral
    +2 more
  • "I don't have experience working with alot of Biological Scientists. Most of my experience comes with Data Scientists. Described how I used ideation techniques like brainstorming and other creative ways to get people to find common ground. I also mentioned how I like to do survey's before meetings to prompt people and also get unbiased opnions"

    Mark M. - "I don't have experience working with alot of Biological Scientists. Most of my experience comes with Data Scientists. Described how I used ideation techniques like brainstorming and other creative ways to get people to find common ground. I also mentioned how I like to do survey's before meetings to prompt people and also get unbiased opnions"See full answer

    Data Scientist
    Behavioral
Showing 101-120 of 155