Data Scientist Interview Questions

Review this list of 125 data scientist interview questions and answers verified by hiring managers and candidates.
  • Walmart Labs logoAsked at Walmart Labs 
    Data Scientist
    Behavioral
    +5 more
  • Adobe logoAsked at Adobe 
    +5

    "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"

    Alvaro R. - "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"See full answer

    Data Scientist
    Coding
    +4 more
  • Figma logoAsked at Figma 
    Data Scientist
    Behavioral
    +2 more
  • McKinsey logoAsked at McKinsey 

    "The cases where data is under heavy outlier influence. Since mean fluctuates due to the presence of an outlier, median might be a better measure"

    Himani E. - "The cases where data is under heavy outlier influence. Since mean fluctuates due to the presence of an outlier, median might be a better measure"See full answer

    Data Scientist
    Statistics & Experimentation
  • Adobe logoAsked at Adobe 
    +2

    "public class sample { public int [] merge(int [] a, int [] b) { if(a == null || a.length == 0 || b == null || b.length == 0) return null; int i = 0, j = 0, index = -1; int [] merged = new int[a.length + b.length]; while (i < a.length && j < b.length) { if(a[i] < b[i]) merged[++index] = a[i++]; else merged[++index] = b[j++]; } while (i < a.length) { merged[++index] = a[i++]; } "

    Nikhil R. - "public class sample { public int [] merge(int [] a, int [] b) { if(a == null || a.length == 0 || b == null || b.length == 0) return null; int i = 0, j = 0, index = -1; int [] merged = new int[a.length + b.length]; while (i < a.length && j < b.length) { if(a[i] < b[i]) merged[++index] = a[i++]; else merged[++index] = b[j++]; } while (i < a.length) { merged[++index] = a[i++]; } "See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • "Clarification question: How many subscription plans are offered by Tinder ? If there is more than one subscription plan, then we need to ask is the fluctuation happening across all plans or in a particular one ? Assumption: Let's say lower priced subscription plan is showing the most fluctuation and there are only two types of plans In this subscription plan which age group is showing the most fluctuation (18-24,25-30, 30+ etc) ? Is there any seasonality trend observed (eg: placemen"

    Srijita P. - "Clarification question: How many subscription plans are offered by Tinder ? If there is more than one subscription plan, then we need to ask is the fluctuation happening across all plans or in a particular one ? Assumption: Let's say lower priced subscription plan is showing the most fluctuation and there are only two types of plans In this subscription plan which age group is showing the most fluctuation (18-24,25-30, 30+ etc) ? Is there any seasonality trend observed (eg: placemen"See full answer

    Data Scientist
    Technical
  • "A Random Forest works by building an ensemble of decision trees, each trained on a slightly different version of the data. The key mechanism is bagging: for each tree, we sample the training data with replacement (bootstrapping), so every tree sees a different subset of examples. On top of that, at each split the algorithm randomly selects a subset of features, so trees explore different predictors. These two sources of randomness decorrelate the trees. When we aggregate them — by averag"

    Yuexiang Y. - "A Random Forest works by building an ensemble of decision trees, each trained on a slightly different version of the data. The key mechanism is bagging: for each tree, we sample the training data with replacement (bootstrapping), so every tree sees a different subset of examples. On top of that, at each split the algorithm randomly selects a subset of features, so trees explore different predictors. These two sources of randomness decorrelate the trees. When we aggregate them — by averag"See full answer

    Data Scientist
    Technical
  • "too much discussing on p-value…. and theoritical things…. country are independant…."

    Brook - "too much discussing on p-value…. and theoritical things…. country are independant…."See full answer

    Data Scientist
    Analytical
  • Adobe logoAsked at Adobe 
    +17

    " O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"

    Rick E. - " O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"See full answer

    Data Scientist
    Data Structures & Algorithms
    +4 more
  • Amazon logoAsked at Amazon 

    "1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"

    Saurabh N. - "1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"See full answer

    Data Scientist
    Behavioral
    +4 more
  • "To handle the non-uniform sampling, I'd first clean and divide the dataset into chunks of n second interval 'uniform' trajectory data(e.g. 5s or 10s trajectories). This gives us a cleaner trajectory data chunks, T, of format (ship_ID, x, y, z, timestamp) to be formed. For the system itself, I'd use a generative model, e.g. Variational AutoEncoder (VAE), and train the model's 'encoder' to produce a latent-space representation of input features (x,y,z,timestamp) from T, and it's 'decoder' to pred"

    Anonymous Hornet - "To handle the non-uniform sampling, I'd first clean and divide the dataset into chunks of n second interval 'uniform' trajectory data(e.g. 5s or 10s trajectories). This gives us a cleaner trajectory data chunks, T, of format (ship_ID, x, y, z, timestamp) to be formed. For the system itself, I'd use a generative model, e.g. Variational AutoEncoder (VAE), and train the model's 'encoder' to produce a latent-space representation of input features (x,y,z,timestamp) from T, and it's 'decoder' to pred"See full answer

    Data Scientist
    System Design
  • Amazon logoAsked at Amazon 

    "Law is my passion. Traveling all over the world in 5 years"

    Moshe S. - "Law is my passion. Traveling all over the world in 5 years"See full answer

    Data Scientist
    Behavioral
    +4 more
  • AstraZeneca logoAsked at AstraZeneca 

    "I don't have experience working with alot of Biological Scientists. Most of my experience comes with Data Scientists. Described how I used ideation techniques like brainstorming and other creative ways to get people to find common ground. I also mentioned how I like to do survey's before meetings to prompt people and also get unbiased opnions"

    Mark M. - "I don't have experience working with alot of Biological Scientists. Most of my experience comes with Data Scientists. Described how I used ideation techniques like brainstorming and other creative ways to get people to find common ground. I also mentioned how I like to do survey's before meetings to prompt people and also get unbiased opnions"See full answer

    Data Scientist
    Behavioral
  • Meta (Facebook) logoAsked at Meta (Facebook) 
    Data Scientist
    Product Strategy
  • "Probability that one of the coupons is used = 1 - Probability that no coupon is used = 1 - nC0 p^0 * (1-p)^n = 1 -(1-p)^n"

    Chetak C. - "Probability that one of the coupons is used = 1 - Probability that no coupon is used = 1 - nC0 p^0 * (1-p)^n = 1 -(1-p)^n"See full answer

    Data Scientist
    Statistics & Experimentation
  • Adobe logoAsked at Adobe 

    Permutations

    IDE
    Medium

    "function permute(nums) { if (nums.length <= 1) { return [nums]; } const prevPermutations = permute(nums.slice(0, nums.length-1)); const currentNum = nums[nums.length-1]; const permutations = new Set(); for (let prev of prevPermutations) { for (let i=0; i < prev.length; i++) { permutations.add([...prev.slice(0, i), currentNum, ...prev.slice(i)]); } permutations.add([...prev, currentNum]); } return [...permutations]"

    Tiago R. - "function permute(nums) { if (nums.length <= 1) { return [nums]; } const prevPermutations = permute(nums.slice(0, nums.length-1)); const currentNum = nums[nums.length-1]; const permutations = new Set(); for (let prev of prevPermutations) { for (let i=0; i < prev.length; i++) { permutations.add([...prev.slice(0, i), currentNum, ...prev.slice(i)]); } permutations.add([...prev, currentNum]); } return [...permutations]"See full answer

    Data Scientist
    Data Structures & Algorithms
    +3 more
  • "[I'm not sure whether the answer below is the best, as I have not gotten result and feedback from my interview] Ans: I would solve by first using a VAE-style model, to create a latent space embedding that translates user description to generate images. Training would be done on the 1000 avatar images and 100000 descriptions, following this scheme: VAE: description -> encoder -> latent space -> decoder -> image Q: "OK, but that means you're limiting the generated images to be only the 1000 imag"

    Nick S. - "[I'm not sure whether the answer below is the best, as I have not gotten result and feedback from my interview] Ans: I would solve by first using a VAE-style model, to create a latent space embedding that translates user description to generate images. Training would be done on the 1000 avatar images and 100000 descriptions, following this scheme: VAE: description -> encoder -> latent space -> decoder -> image Q: "OK, but that means you're limiting the generated images to be only the 1000 imag"See full answer

    Data Scientist
    Machine Learning
  • Walmart Labs logoAsked at Walmart Labs 
    Data Scientist
    Behavioral
    +2 more
Showing 81-100 of 125