"What do all data scientists need to know about how to work with very large datasets?
37
Follow
Request
Answer
More
All related (39)
Recommended
📷
Corrin Lakeland
·
Follow
, M.S. Data Science, University of St. Thomas, St. Paul (2018)6yData Science consultant and managerUpvoted by[Tom Halloin](https://www.quora"
Hayatu H. - "What do all data scientists need to know about how to work with very large datasets?
37
Follow
Request
Answer
More
All related (39)
Recommended
📷
Corrin Lakeland
·
Follow
, M.S. Data Science, University of St. Thomas, St. Paul (2018)6yData Science consultant and managerUpvoted by[Tom Halloin](https://www.quora"See full answer
"Use a representative of each, e.g. sort the string and add it to the value of a hashmap> where we put all the words that belong to the same anagram together."
Gaston B. - "Use a representative of each, e.g. sort the string and add it to the value of a hashmap> where we put all the words that belong to the same anagram together."See full answer
"Missing Item - User ordered multiple items, few items are missing
Wrong Item - Entire order is wrong / there are items in the order that were never ordered
How is this measured ?
CSAT
Missing Items
Wrong Items
Step 1 :
Collect data on orders that reported missing / wrong items. Dive deep to understand if the problem is isolated to a specific metro/zip code/restaurant type (say fast food vs fine dine), time of day (lunch vs dinner), tenure of the courier on th"
Saurabh K. - "Missing Item - User ordered multiple items, few items are missing
Wrong Item - Entire order is wrong / there are items in the order that were never ordered
How is this measured ?
CSAT
Missing Items
Wrong Items
Step 1 :
Collect data on orders that reported missing / wrong items. Dive deep to understand if the problem is isolated to a specific metro/zip code/restaurant type (say fast food vs fine dine), time of day (lunch vs dinner), tenure of the courier on th"See full answer
"we can use two pointer + set like maintain i,j and also insert jth character to set like while set size is equal to our window j-i+1 then maximize our answer and increase jth pointer till last index"
Kishor J. - "we can use two pointer + set like maintain i,j and also insert jth character to set like while set size is equal to our window j-i+1 then maximize our answer and increase jth pointer till last index"See full answer
"#inplace reversal without inbuilt functions
def reverseString(s):
chars = list(s)
l, r = 0, len(s)-1
while l < r:
chars[l],chars[r] = chars[r],chars[l]
l += 1
r -= 1
reversed = "".join(chars)
return reversed
"
Anonymous Possum - "#inplace reversal without inbuilt functions
def reverseString(s):
chars = list(s)
l, r = 0, len(s)-1
while l < r:
chars[l],chars[r] = chars[r],chars[l]
l += 1
r -= 1
reversed = "".join(chars)
return reversed
"See full answer
"Data lake and warehouse are both places that allow an organization to store large amounts of data.
When swimming in a lake, one would imagine that they come across all sorts of stuff - floating twigs, fish in the water, stones, chemicals and sometimes may be even a snake. Similarly, a data lake stores all forms of data that the company has without any indexing. The data is available at any time but needs to be first cleaned up and reorganized before it can be used for any type of analysis.
A"
Kshitij I. - "Data lake and warehouse are both places that allow an organization to store large amounts of data.
When swimming in a lake, one would imagine that they come across all sorts of stuff - floating twigs, fish in the water, stones, chemicals and sometimes may be even a snake. Similarly, a data lake stores all forms of data that the company has without any indexing. The data is available at any time but needs to be first cleaned up and reorganized before it can be used for any type of analysis.
A"See full answer
"Any cycle would cause the prerequisite to be greater than the course. This passes all the tests:
function canFinish(_numCourses, prerequisites) {
for (const [a, b] of prerequisites) {
if (b > a) return false
}
return true
}
`"
Jeremy D. - "Any cycle would cause the prerequisite to be greater than the course. This passes all the tests:
function canFinish(_numCourses, prerequisites) {
for (const [a, b] of prerequisites) {
if (b > a) return false
}
return true
}
`"See full answer
"There are couple of reasons for it -
Kind of role : Its a product manager role loaded with analytical work, So working with data in stringent regulatory guideline make it more exciting and thrilling.
Location & industry is like - Cherry on the cake, Bangalore weather and BFI is at its all time peak as people spending behavior is changing continuously, it will be interesting to see big giants like visa are managing it."
Nidhi S. - "There are couple of reasons for it -
Kind of role : Its a product manager role loaded with analytical work, So working with data in stringent regulatory guideline make it more exciting and thrilling.
Location & industry is like - Cherry on the cake, Bangalore weather and BFI is at its all time peak as people spending behavior is changing continuously, it will be interesting to see big giants like visa are managing it."See full answer
"
A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation.
In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"
Scott S. - "
A couple of years ago, we were working on a project to integrate a new third-party data feed into our existing data processing pipeline. This data feed was critical for enhancing our trading algorithms with more comprehensive market data. Given the tight timeline and high stakes, I decided to push for a rapid implementation.
In my eagerness to meet the deadline, I underestimated the complexity of integrating this new data feed. I did not allocate sufficient time for thorough testing and valida"See full answer
"this solution here is much faster than the exponent reference soln. It is also far more concise and easy to understand
def moveZerosToEnd(arr: List[int]) -> List[int]:
left = 0
for right in range(len(arr)):
if arr[right] == 0:
pass
else:
if left != right:
temp = arr[left]
arr[left] = arr[right]
arr[right] = temp
left += 1
return arr
`"
Devesh K. - "this solution here is much faster than the exponent reference soln. It is also far more concise and easy to understand
def moveZerosToEnd(arr: List[int]) -> List[int]:
left = 0
for right in range(len(arr)):
if arr[right] == 0:
pass
else:
if left != right:
temp = arr[left]
arr[left] = arr[right]
arr[right] = temp
left += 1
return arr
`"See full answer