Machine Learning Interview Questions

Review this list of 67 machine learning interview questions and answers verified by hiring managers and candidates.
  • Amazon logoAsked at Amazon 
    Video answer for 'Implement k-means clustering.'
    Machine Learning Engineer
    Machine Learning
    +4 more
  • Machine Learning
    Coding
  • Pinterest logoAsked at Pinterest 
    Video answer for 'Implement a k-nearest neighbors algorithm.'
    +3

    "Even more faster and vectorized version, using np.linalg.norm - to avoid loop and np.argpartition to select lowest k. We dont need to sort whole array - we need to be sure that first k elements are lower than the rest. import numpy as np def knn(Xtrain, ytrain, X_new, k): distances = np.linalg.norm(Xtrain - Xnew, axis=1) k_indices = np.argpartition(distances, k)[:k] # O(N) selection instead of O(N log N) sort return int(np.sum(ytrain[kindices]) > k / 2.0) `"

    Dinar M. - "Even more faster and vectorized version, using np.linalg.norm - to avoid loop and np.argpartition to select lowest k. We dont need to sort whole array - we need to be sure that first k elements are lower than the rest. import numpy as np def knn(Xtrain, ytrain, X_new, k): distances = np.linalg.norm(Xtrain - Xnew, axis=1) k_indices = np.argpartition(distances, k)[:k] # O(N) selection instead of O(N log N) sort return int(np.sum(ytrain[kindices]) > k / 2.0) `"See full answer

    Machine Learning Engineer
    Machine Learning
    +1 more
  • Machine Learning Engineer
    Machine Learning
    +1 more
  • +2

    "In details: setting k=1 in KNN makes the model fit very closely to the training data, capturing a lot of the data's noise and leading to a model that may not generalize well to unseen data. This results in a high-variance scenario."

    Taha U. - "In details: setting k=1 in KNN makes the model fit very closely to the training data, capturing a lot of the data's noise and leading to a model that may not generalize well to unseen data. This results in a high-variance scenario."See full answer

    Machine Learning
    Concept
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • "1) create the experimental and control groups. 2) Then calculate the proportion (mean) of the true conversion rates for both groups using the convert column which counts True as 1 and False as 0. This is their conversion rates 3) calculate the statistic of the two groups by subtracting the proportion and standardizing. 4) get the p-value and compare with 0.05. 5) conclude the difference is statistically significant if the p-value is less than 0.05 otherwise no statistical difference"

    Frank A. - "1) create the experimental and control groups. 2) Then calculate the proportion (mean) of the true conversion rates for both groups using the convert column which counts True as 1 and False as 0. This is their conversion rates 3) calculate the statistic of the two groups by subtracting the proportion and standardizing. 4) get the p-value and compare with 0.05. 5) conclude the difference is statistically significant if the p-value is less than 0.05 otherwise no statistical difference"See full answer

    Machine Learning
    Coding
  • "While running the testloop I am getting an error RuntimeError: runningmean should contain 28 elements not 38. I think it's the difference between the categorical features in train and test. `"

    Abinash S. - "While running the testloop I am getting an error RuntimeError: runningmean should contain 28 elements not 38. I think it's the difference between the categorical features in train and test. `"See full answer

    Machine Learning
    Coding
  • Nvidia logoAsked at Nvidia 

    "Over-fitting of a model occurs when model fails to generalize to any new data and has high variance withing training data whereas in under fitting model isn't able to uncover the underlying pattern in the training data and high bias. Tree based model like decision tree and random forest are likely to overfit whereas linear models like linear regression and logistic regression tends to under fit. There are many reasons why a Random forest can overfits easily 1. Model has grown to its full depth a"

    Jyoti V. - "Over-fitting of a model occurs when model fails to generalize to any new data and has high variance withing training data whereas in under fitting model isn't able to uncover the underlying pattern in the training data and high bias. Tree based model like decision tree and random forest are likely to overfit whereas linear models like linear regression and logistic regression tends to under fit. There are many reasons why a Random forest can overfits easily 1. Model has grown to its full depth a"See full answer

    Machine Learning Engineer
    Machine Learning
    +2 more
  • "I checked the unittest is giving a False assertion as you can see in the colab notebook below. F FAIL: testsimple (main_.Conv2dTest) Traceback (most recent call last): File "", line 19, in test_simple self.assertTrue(torch.equal(output, torch.tensor([[[[ 5., 1.], [ -2., -10.]]]]))) AssertionError: False is not true"

    Abinash S. - "I checked the unittest is giving a False assertion as you can see in the colab notebook below. F FAIL: testsimple (main_.Conv2dTest) Traceback (most recent call last): File "", line 19, in test_simple self.assertTrue(torch.equal(output, torch.tensor([[[[ 5., 1.], [ -2., -10.]]]]))) AssertionError: False is not true"See full answer

    Machine Learning
  • Capital One logoAsked at Capital One 

    "through the combination of online resources, hands on project and community engagement."

    Ihuoma remita U. - "through the combination of online resources, hands on project and community engagement."See full answer

    Machine Learning Engineer
    Machine Learning
    +1 more
  • "For data distribution drift: DL Divergence or PSI (Population Stability Index) performance: two categories: 1st operational metrics: runtime. 2nd model performance: loss function, MAE (regression), business metrics: overall watch time, DAU, revenue lift etc Outlier: data distribution"

    L B. - "For data distribution drift: DL Divergence or PSI (Population Stability Index) performance: two categories: 1st operational metrics: runtime. 2nd model performance: loss function, MAE (regression), business metrics: overall watch time, DAU, revenue lift etc Outlier: data distribution"See full answer

    Machine Learning Engineer
    Machine Learning
    +1 more
  • Amazon logoAsked at Amazon 
    Video answer for 'What are common linear regression problems?'

    "I can try to summarize their discussion as I remembered. Linear regression is one of the method to predict target (Y) using features (X). Formula for linear regression is a linear function of features. The aim is to choose coefficients (Teta) of the prediction function in such a way that the difference between target and prediction is least in average. This difference between target and prediction is called loss function. The form of this loss function could be dependent from the particular real"

    Ilnur I. - "I can try to summarize their discussion as I remembered. Linear regression is one of the method to predict target (Y) using features (X). Formula for linear regression is a linear function of features. The aim is to choose coefficients (Teta) of the prediction function in such a way that the difference between target and prediction is least in average. This difference between target and prediction is called loss function. The form of this loss function could be dependent from the particular real"See full answer

    Data Scientist
    Machine Learning
    +2 more
  • Machine Learning
    Concept
  • Machine Learning Engineer
    Machine Learning
    +1 more
  • Machine Learning
    System Design
Showing 21-40 of 67