Skip to main content

Databricks Interview Questions

Review this list of 18 Databricks interview questions and answers verified by hiring managers and candidates.
  • Databricks logoAsked at Databricks 
    +2

    "ingestion, processing & storage layer to handle document processing client ->API gateway/entry point->object storage-> queue-> worker-> database data flow: client initiates document upload + status processing API gateway (upload endpoint: authenticates & authorizes request, creates pre-assigned url to upload document); status endpoint object storage - stores uploaded document unstructured data (images, pdfs, docx etc) via preassigned url Message queue to decouple ingestion from proc"

    Tracy M. - "ingestion, processing & storage layer to handle document processing client ->API gateway/entry point->object storage-> queue-> worker-> database data flow: client initiates document upload + status processing API gateway (upload endpoint: authenticates & authorizes request, creates pre-assigned url to upload document); status endpoint object storage - stores uploaded document unstructured data (images, pdfs, docx etc) via preassigned url Message queue to decouple ingestion from proc"See full answer

    Software Engineer
    Data Pipeline Design
    +2 more
  • Databricks logoAsked at Databricks 

    "user table - with userid, username, email, phonenumber, accountcreateddate exercises table - types of exercises - indoor walk, outdoor walk, running, stairs, cycling, swimming etc - exerciseid, exercisetype date table - date, day, month, year - with dateid Session table - userid, sessiondateid(linked to dateid in date table), exerciseid, distance covered, calories spent, starttime, endtime "

    Sreeram reddy B. - "user table - with userid, username, email, phonenumber, accountcreateddate exercises table - types of exercises - indoor walk, outdoor walk, running, stairs, cycling, swimming etc - exerciseid, exercisetype date table - date, day, month, year - with dateid Session table - userid, sessiondateid(linked to dateid in date table), exerciseid, distance covered, calories spent, starttime, endtime "See full answer

    Data Engineer
    Data Modeling
  • Databricks logoAsked at Databricks 

    "Constraints: 4-direction moves; no mode switching (pick exactly one of {1=bicycle, 2=bike, 3=car, 4=bus} for the full trip). Per-mode search: If a mode’s per-step time/cost are uniform, run BFS on allowed cells. Then totaltime = steps × timeperstep, tie-break by steps × costper_step. If time/cost vary by cell (given matrices), run Dijkstra per mode minimizing (totaltime, totalcost) lexicographically. Maintain the best ⟨time, cost⟩ per cell; relax when the new pair is strictly better. S"

    Rahul J. - "Constraints: 4-direction moves; no mode switching (pick exactly one of {1=bicycle, 2=bike, 3=car, 4=bus} for the full trip). Per-mode search: If a mode’s per-step time/cost are uniform, run BFS on allowed cells. Then totaltime = steps × timeperstep, tie-break by steps × costper_step. If time/cost vary by cell (given matrices), run Dijkstra per mode minimizing (totaltime, totalcost) lexicographically. Maintain the best ⟨time, cost⟩ per cell; relax when the new pair is strictly better. S"See full answer

    Software Engineer
    Data Structures & Algorithms
    +1 more
  • Databricks logoAsked at Databricks 
    Software Engineer
    System Design
    +1 more
  • +2

    "This is yet another classic case of evolution of data landscape to account for diversities in the data formats sacrificing restrictive but key components at first and added later to make the solution more effective. Data warehouse -> Data Lake -> Data Lakehouse (Data Lake + Data Warehouse) Data warehouse - A solution to store data in central place (analytics (read) heavy) with stringent schema (structured). Very useful for historical queries and analytics. Schema on write check. Only used for"

    Karthik R. - "This is yet another classic case of evolution of data landscape to account for diversities in the data formats sacrificing restrictive but key components at first and added later to make the solution more effective. Data warehouse -> Data Lake -> Data Lakehouse (Data Lake + Data Warehouse) Data warehouse - A solution to store data in central place (analytics (read) heavy) with stringent schema (structured). Very useful for historical queries and analytics. Schema on write check. Only used for"See full answer

    Data Engineer
    Data Pipeline Design
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • Databricks logoAsked at Databricks 

    "One Accomplishment I'm most proud of is that I graduated from Schaumburg High School In May of 2021 and I was able to get up the stage and collect my diploma. This was a HUGE Impact in regards of passing all of my classes and earning all of my credits in order to be apart of the NOW Arena graduation."

    Amparo L. - "One Accomplishment I'm most proud of is that I graduated from Schaumburg High School In May of 2021 and I was able to get up the stage and collect my diploma. This was a HUGE Impact in regards of passing all of my classes and earning all of my credits in order to be apart of the NOW Arena graduation."See full answer

    Product Manager
    Behavioral
    +1 more
  • Databricks logoAsked at Databricks 
    Video answer for 'What is your leadership style?'
    +7

    "My leadership style is flexible and adaptive, it varies depending on the team members and the needs of the company. My leadership goal is to empower the team and inspire and grow leaders. In order to achieve that, I combine transformational, democratic and coaching leadership styles. Usually when we are facing a new type of challenge, or at the early stage of a project, I like to adapt the transformational leadership which allows me to listen to all the suggestions from the team members and sta"

    onering2ruleall - "My leadership style is flexible and adaptive, it varies depending on the team members and the needs of the company. My leadership goal is to empower the team and inspire and grow leaders. In order to achieve that, I combine transformational, democratic and coaching leadership styles. Usually when we are facing a new type of challenge, or at the early stage of a project, I like to adapt the transformational leadership which allows me to listen to all the suggestions from the team members and sta"See full answer

    Engineering Manager
    Behavioral
    +4 more
  • "We have to work with the c-suite to understood the direct quartly outcomes or goals. This could be our epic and then we try to break that down into business value and complexity . This will allow us to prioritize whats next. From there we can structure a mvp to cover maybe some of these areas to understand the estimation of this work. After the first couple weeks we can structure a roadmap and then define when"

    Howard H. - "We have to work with the c-suite to understood the direct quartly outcomes or goals. This could be our epic and then we try to break that down into business value and complexity . This will allow us to prioritize whats next. From there we can structure a mvp to cover maybe some of these areas to understand the estimation of this work. After the first couple weeks we can structure a roadmap and then define when"See full answer

    Product Manager
    Behavioral
  • Databricks logoAsked at Databricks 
    Video answer for 'Demo LabelBox for an Autonomous Delivery Client'
    Solutions Architect
    Customer Interaction
  • "There are 2 questions popping into my mind: Should the 2nd job have to kick off at 12:30AM? Are there others depending on the 2nd job? If both answers are no, we may simply postpone the second job to allow sufficient time for the first one to complete. If they are yeses, we could let the 2nd job retry to a certain amount of times. Make sure that even reaching the maximum of retries won't delay or fail the following jobs."

    Anzhe M. - "There are 2 questions popping into my mind: Should the 2nd job have to kick off at 12:30AM? Are there others depending on the 2nd job? If both answers are no, we may simply postpone the second job to allow sufficient time for the first one to complete. If they are yeses, we could let the 2nd job retry to a certain amount of times. Make sure that even reaching the maximum of retries won't delay or fail the following jobs."See full answer

    Data Engineer
    Data Pipeline Design
  • Databricks logoAsked at Databricks 
    Data Engineer
    Data Pipeline Design
  • Databricks logoAsked at Databricks 

    "You will need to start from Browser and go all the way up to Analytic systems and methods. Everything needs to be covered"

    Divya K. - "You will need to start from Browser and go all the way up to Analytic systems and methods. Everything needs to be covered"See full answer

    Technical Program Manager
    Analytical
    +2 more
  • Data Engineer
    Data Pipeline Design
  • Databricks logoAsked at Databricks 

    "Delta lake is a metadata layer on top of cloud storage which helps giving datalake transactional capabilities. It helps implement upsert/merge as it conforms a schema to the data assets stored in cloud. It also offers various other capabilities like liquid clustering,time travel, schema evolution,deletes."

    Nitish C. - "Delta lake is a metadata layer on top of cloud storage which helps giving datalake transactional capabilities. It helps implement upsert/merge as it conforms a schema to the data assets stored in cloud. It also offers various other capabilities like liquid clustering,time travel, schema evolution,deletes."See full answer

    Data Engineer
    Data Pipeline Design
  • Databricks logoAsked at Databricks 

    "Explain how you implemented your telemetry and observability in previous projects."

    Divya K. - "Explain how you implemented your telemetry and observability in previous projects."See full answer

    Technical Program Manager
    Technical
  • Data Engineer
    Data Pipeline Design
  • "All purpose cluster remains up and running for longer duration irrespective of the job hence preferred for notebooks, adhoc work whereas job cluster spins up as per the submitted job and shuts down post the completion hence preferred for production scheduled workloads as it also offers compute isolation"

    Nitish C. - "All purpose cluster remains up and running for longer duration irrespective of the job hence preferred for notebooks, adhoc work whereas job cluster spins up as per the submitted job and shuts down post the completion hence preferred for production scheduled workloads as it also offers compute isolation"See full answer

    Data Engineer
    Data Pipeline Design
Showing 1-18 of 18