"Even more faster and vectorized version, using np.linalg.norm - to avoid loop and np.argpartition to select lowest k. We dont need to sort whole array - we need to be sure that first k elements are lower than the rest.
import numpy as np
def knn(Xtrain, ytrain, X_new, k):
distances = np.linalg.norm(Xtrain - Xnew, axis=1)
k_indices = np.argpartition(distances, k)[:k] # O(N) selection instead of O(N log N) sort
return int(np.sum(ytrain[kindices]) > k / 2.0)
`"
Dinar M. - "Even more faster and vectorized version, using np.linalg.norm - to avoid loop and np.argpartition to select lowest k. We dont need to sort whole array - we need to be sure that first k elements are lower than the rest.
import numpy as np
def knn(Xtrain, ytrain, X_new, k):
distances = np.linalg.norm(Xtrain - Xnew, axis=1)
k_indices = np.argpartition(distances, k)[:k] # O(N) selection instead of O(N log N) sort
return int(np.sum(ytrain[kindices]) > k / 2.0)
`"See full answer
"In details: setting k=1 in KNN makes the model fit very closely to the training data, capturing a lot of the data's noise and leading to a model that may not generalize well to unseen data. This results in a high-variance scenario."
Taha U. - "In details: setting k=1 in KNN makes the model fit very closely to the training data, capturing a lot of the data's noise and leading to a model that may not generalize well to unseen data. This results in a high-variance scenario."See full answer
Machine Learning
Concept
🧠Want an expert answer to a question? Saving questions lets us know what content to make next.
"1) create the experimental and control groups.
2) Then calculate the proportion (mean) of the true conversion rates for both groups using the convert column which counts True as 1 and False as 0. This is their conversion rates
3) calculate the statistic of the two groups by subtracting the proportion and standardizing.
4) get the p-value and compare with 0.05.
5) conclude the difference is statistically significant if the p-value is less than 0.05 otherwise no statistical difference"
Frank A. - "1) create the experimental and control groups.
2) Then calculate the proportion (mean) of the true conversion rates for both groups using the convert column which counts True as 1 and False as 0. This is their conversion rates
3) calculate the statistic of the two groups by subtracting the proportion and standardizing.
4) get the p-value and compare with 0.05.
5) conclude the difference is statistically significant if the p-value is less than 0.05 otherwise no statistical difference"See full answer
"While running the testloop I am getting an error RuntimeError: runningmean should contain 28 elements not 38.
I think it's the difference between the categorical features in train and test.
`"
Abinash S. - "While running the testloop I am getting an error RuntimeError: runningmean should contain 28 elements not 38.
I think it's the difference between the categorical features in train and test.
`"See full answer
"Over-fitting of a model occurs when model fails to generalize to any new data and has high variance withing training data whereas in under fitting model isn't able to uncover the underlying pattern in the training data and high bias.
Tree based model like decision tree and random forest are likely to overfit whereas linear models like linear regression and logistic regression tends to under fit.
There are many reasons why a Random forest can overfits easily 1. Model has grown to its full depth a"
Jyoti V. - "Over-fitting of a model occurs when model fails to generalize to any new data and has high variance withing training data whereas in under fitting model isn't able to uncover the underlying pattern in the training data and high bias.
Tree based model like decision tree and random forest are likely to overfit whereas linear models like linear regression and logistic regression tends to under fit.
There are many reasons why a Random forest can overfits easily 1. Model has grown to its full depth a"See full answer
"I checked the unittest is giving a False assertion as you can see in the colab notebook below.
F
FAIL: testsimple (main_.Conv2dTest)
Traceback (most recent call last):
File "", line 19, in test_simple
self.assertTrue(torch.equal(output, torch.tensor([[[[ 5., 1.], [ -2., -10.]]]])))
AssertionError: False is not true"
Abinash S. - "I checked the unittest is giving a False assertion as you can see in the colab notebook below.
F
FAIL: testsimple (main_.Conv2dTest)
Traceback (most recent call last):
File "", line 19, in test_simple
self.assertTrue(torch.equal(output, torch.tensor([[[[ 5., 1.], [ -2., -10.]]]])))
AssertionError: False is not true"See full answer
"For data distribution drift: DL Divergence or PSI (Population Stability Index)
performance: two categories: 1st operational metrics: runtime. 2nd model performance: loss function, MAE (regression), business metrics: overall watch time, DAU, revenue lift etc
Outlier: data distribution"
L B. - "For data distribution drift: DL Divergence or PSI (Population Stability Index)
performance: two categories: 1st operational metrics: runtime. 2nd model performance: loss function, MAE (regression), business metrics: overall watch time, DAU, revenue lift etc
Outlier: data distribution"See full answer
"I can try to summarize their discussion as I remembered.
Linear regression is one of the method to predict target (Y) using features (X).
Formula for linear regression is a linear function of features. The aim is to choose coefficients (Teta) of the prediction function in such a way that the difference between target and prediction is least in average.
This difference between target and prediction is called loss function. The form of this loss function could be dependent from the particular real"
Ilnur I. - "I can try to summarize their discussion as I remembered.
Linear regression is one of the method to predict target (Y) using features (X).
Formula for linear regression is a linear function of features. The aim is to choose coefficients (Teta) of the prediction function in such a way that the difference between target and prediction is least in average.
This difference between target and prediction is called loss function. The form of this loss function could be dependent from the particular real"See full answer