Data Scientist Concept Interview Questions

Review this list of 15 Concept Data Scientist interview questions and answers verified by hiring managers and candidates.

+ Share interview

Product

Engineering

Operations

Design

Marketing

Data

Sales

Finance

Consulting

Security

Share interview

SQL Stored Procedures
Data Scientist
Concept
+4 more
2 answers I was asked this
"it is really good explanation thanks it is really good explanation thanks"
Amney M. - "it is really good explanation thanks it is really good explanation thanks"See full answer
Data Scientist
Concept
+4 more
Asked at Meta (Facebook), Goldman Sachs, LinkedIn • a year ago
Explain Bayes' theorem.
Data Scientist
Concept
+2 more
3 answers I was asked this
"Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."
Will I. - "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."See full answer
Data Scientist
Concept
+2 more
Asked at Amazon • 3 years ago
Session Data Analysis.
Hard
Data Scientist
Concept
+4 more
6 answers I was asked this
"1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"
Erjan G. - "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"See full answer
Data Scientist
Concept
+4 more
Asked at Nvidia, OpenAI • a year ago
What is overfitting or underfitting? Which models are most likely to experience this, and why?
Data Scientist
Concept
+2 more
4 answers I was asked this
+1
"Over-fitting of a model occurs when model fails to generalize to any new data and has high variance withing training data whereas in under fitting model isn't able to uncover the underlying pattern in the training data and high bias. Tree based model like decision tree and random forest are likely to overfit whereas linear models like linear regression and logistic regression tends to under fit. There are many reasons why a Random forest can overfits easily 1. Model has grown to its full depth a"
Jyoti V. - "Over-fitting of a model occurs when model fails to generalize to any new data and has high variance withing training data whereas in under fitting model isn't able to uncover the underlying pattern in the training data and high bias. Tree based model like decision tree and random forest are likely to overfit whereas linear models like linear regression and logistic regression tends to under fit. There are many reasons why a Random forest can overfits easily 1. Model has grown to its full depth a"See full answer
Data Scientist
Concept
+2 more
Asked at Amazon, Apple, Walmart Labs • 6 months ago
What is the difference between NoSQL and SQL?
Data Scientist
Concept
+7 more
3 answers I was asked this
"SQL databases are relational, NoSQL databases are non-relational. SQL databases use structured query language and have a predefined schema. NoSQL databases have dynamic schemas for unstructured data. SQL databases are vertically scalable, while NoSQL databases are horizontally scalable."
Ali H. - "SQL databases are relational, NoSQL databases are non-relational. SQL databases use structured query language and have a predefined schema. NoSQL databases have dynamic schemas for unstructured data. SQL databases are vertically scalable, while NoSQL databases are horizontally scalable."See full answer
Data Scientist
Concept
+7 more

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

Asked at Deloitte • a year ago
Explain the key differences between BETWEEN and HAVING clauses in SQL.
Data Scientist
Concept
+4 more
3 answers I was asked this
"BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"
Meenakshi D. - "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"See full answer
Data Scientist
Concept
+4 more
Asked at Google, LendingClub, Nvidia + 1 more • a year ago
Explain Deep Learning to a non-technical audience.
Data Scientist
Concept
+3 more
4 answers I was asked this
+1
"Deep Learning is a part of Artificial Intelligence, it's like teaching the machine to think and make decisions on its own. It's like how we teach a child the concept of an apple - it's round, red, has a stem on top. We show them multiple pictures of apples and then they understand and can recognize an apple in future. Similarly, we feed lots of data to the machine, and slowly, it starts learning from that data, and can then make relevant predictions or decisions based on what it has learnt. A co"
Surbhi G. - "Deep Learning is a part of Artificial Intelligence, it's like teaching the machine to think and make decisions on its own. It's like how we teach a child the concept of an apple - it's round, red, has a stem on top. We show them multiple pictures of apples and then they understand and can recognize an apple in future. Similarly, we feed lots of data to the machine, and slowly, it starts learning from that data, and can then make relevant predictions or decisions based on what it has learnt. A co"See full answer
Data Scientist
Concept
+3 more
Asked at OpenAI • a year ago
Explain deep reinforcement learning.
Data Scientist
Concept
+1 more
2 answers I was asked this
"Reinforcement Learning is a type of machine learning where an agent learns to make decisions by trying out different actions and receiving rewards or penalties in return. The goal is to learn, over time, which actions yield the highest rewards. There are three core components in RL: The agent — the learner or decision-maker (e.g., an algorithm or robot), The environment — everything the agent interacts with, Actions and rewards — the agent takes actions, and the environmen"
Constantin P. - "Reinforcement Learning is a type of machine learning where an agent learns to make decisions by trying out different actions and receiving rewards or penalties in return. The goal is to learn, over time, which actions yield the highest rewards. There are three core components in RL: The agent — the learner or decision-maker (e.g., an algorithm or robot), The environment — everything the agent interacts with, Actions and rewards — the agent takes actions, and the environmen"See full answer
Data Scientist
Concept
+1 more
Asked at Amazon • 4 years ago
What are common linear regression problems?
Data Scientist
Concept
+2 more
2 answers I was asked this
"I can try to summarize their discussion as I remembered. Linear regression is one of the method to predict target (Y) using features (X). Formula for linear regression is a linear function of features. The aim is to choose coefficients (Teta) of the prediction function in such a way that the difference between target and prediction is least in average. This difference between target and prediction is called loss function. The form of this loss function could be dependent from the particular real"
Ilnur I. - "I can try to summarize their discussion as I remembered. Linear regression is one of the method to predict target (Y) using features (X). Formula for linear regression is a linear function of features. The aim is to choose coefficients (Teta) of the prediction function in such a way that the difference between target and prediction is least in average. This difference between target and prediction is called loss function. The form of this loss function could be dependent from the particular real"See full answer
Data Scientist
Concept
+2 more
Asked at Google • 4 years ago
What is the best way to connect SQL databases and why?
Data Scientist
Concept
+5 more
2 answers I was asked this
"Clarification questions What is the purpose of connecting the DB? Do we expect high-volumes of traffic to hit the DB Do we have scalability or reliability concerns? Format Code -> DB Code -> Cache -> DB API -> Cache -> DB - APIs are built for a purpose and have a specified protocol (GET, POST, DELETE) to speak to the DB. APIs can also use a contract to retrieve information from a DB much faster than code. Load balanced APIs -> Cache -> DB **Aut"
Aaron W. - "Clarification questions What is the purpose of connecting the DB? Do we expect high-volumes of traffic to hit the DB Do we have scalability or reliability concerns? Format Code -> DB Code -> Cache -> DB API -> Cache -> DB - APIs are built for a purpose and have a specified protocol (GET, POST, DELETE) to speak to the DB. APIs can also use a contract to retrieve information from a DB much faster than code. Load balanced APIs -> Cache -> DB **Aut"See full answer
Data Scientist
Concept
+5 more
Describe how the split in a decision tree works.
Data Scientist
Concept
+1 more
Add answer I was asked this
Data Scientist
Concept
+1 more
Asked at Salesforce • a year ago
Given n employees and an expansion rate r, find the number of employees after t years.
Data Scientist
Concept
1 answer I was asked this
"Number of employees after the first year = n*(1+r) = n1 Number of employees after the second year = n1(1+r) = n(1+r)**2 Hence, the number of employees after 't' years = n(1+r)*t"
Asish B. - "Number of employees after the first year = n*(1+r) = n1 Number of employees after the second year = n1(1+r) = n(1+r)**2 Hence, the number of employees after 't' years = n(1+r)*t"See full answer
Data Scientist
Concept
Asked at Goldman Sachs • a year ago
If a + b + c + d = 63, what's the maximum value of a * b + b * c + c * d?
Data Scientist
Concept
Add answer I was asked this
Data Scientist
Concept
Asked at Goldman Sachs • a year ago
You roll a dice until the total reaches 100 or more. Which appears more often in the rolls: 1 or 6?
Data Scientist
Concept
Add answer I was asked this
Data Scientist
Concept
Asked at Goldman Sachs • a year ago
What is the limit of the ratio of two consecutive Fibonacci numbers as the sequence progresses?
Data Scientist
Concept
Add answer I was asked this
Data Scientist
Concept