Spring Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

ExamsBrite Dumps

Databricks Certified Data Engineer Associate Exam Question and Answers

Databricks Certified Data Engineer Associate Exam

Last Update Feb 28, 2026
Total Questions : 159

We are offering FREE Databricks-Certified-Data-Engineer-Associate Databricks exam questions. All you do is to just go and sign up. Give your details, prepare Databricks-Certified-Data-Engineer-Associate free exam questions and then go for complete pool of Databricks Certified Data Engineer Associate Exam test questions that will help you more.

Databricks-Certified-Data-Engineer-Associate pdf

Databricks-Certified-Data-Engineer-Associate PDF

$36.75  $104.99
Databricks-Certified-Data-Engineer-Associate Engine

Databricks-Certified-Data-Engineer-Associate Testing Engine

$43.75  $124.99
Databricks-Certified-Data-Engineer-Associate PDF + Engine

Databricks-Certified-Data-Engineer-Associate PDF + Testing Engine

$57.75  $164.99
Questions 1

Which of the following commands will return the location of database customer360?

Options:

A.  

DESCRIBE LOCATION customer360;

B.  

DROP DATABASE customer360;

C.  

DESCRIBE DATABASE customer360;

D.  

ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};

E.  

USE DATABASE customer360;

Discussion 0
Questions 2

Which file format is used for storing Delta Lake Table?

Options:

A.  

Parquet

B.  

Delta

C.  

SV

D.  

JSON

Discussion 0
Questions 3

A data engineer wants to reduce costs and optimize cloud spending. The data engineer has decided to use Databricks Serverless for lowering cloud costs while maintaining existing SLAs.

What is the first step in migrating to Databricks Serverless?

Options:

A.  

Legacy Ingestion pipelines that include ingestion from sources API's, files, JDBC/ODBC connections

B.  

Low frequency Bl Dashboarding and Adhoc SQL Analytics

C.  

A frequently running and efficient Python-based data transformation pipeline compatible with the latest Databricks runtime and Unity Catalog

D.  

A frequently running and efficient Scala-based data transformation pipeline compatible with the latest Databricks runtime and Unity Catalog

Discussion 0
Questions 4

A data engineer streams customer orders into a Kafka topic (orders_topic) and is currently writing the ingestion script of a DLT pipeline. The data engineer needs to ingest the data from Kafka brokers to DLT using Databricks

What is the correct code for ingesting the data?

A)

B)

C)

D)

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

Discussion 0
Questions 5

A data engineer needs to process SQL queries on a large dataset with fluctuating workloads. The workload requires automatic scaling based on the volume of queries, without the need to manage or provision infrastructure. The solution should be cost-efficient and charge only for the compute resources used during query execution.

Which compute option should the data engineer use?

Options:

A.  

Databricks SQL Analytics

B.  

Databricks Jobs

C.  

Databricks Runtime for ML

D.  

Serverless SQL Warehouse

Discussion 0
Questions 6

Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.

A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.

How can the scenario be implemented?

Options:

A.  

CONSTRAINT valid_location EXPECT (location = NULL)

B.  

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL UPDATE

C.  

CONSTRAINT valid_location EXPECT (location != NULL) ON DROP ROW

D.  

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL

Discussion 0
Questions 7

A data engineer needs to optimize the data layout and query performance for an e-commerce transactions Delta table. The table is partitioned by "purchase_date" a date column which helps with time-based queries but does not optimize searches on user statistics "customer_id", a high-cardinality column.

The table is usually queried with filters on "customer_i

d" within specific date ranges, but since this data is spread across multiple files in each partition, it results in full partition scans and increased runtime and costs.

How should the data engineer optimize the Data Layout for efficient reads?

Options:

A.  

Alter table implementing liquid clustering on "customerid" while keeping the existing partitioning.

B.  

Alter the table to partition by "customer_id".

C.  

Enable delta caching on the cluster so that frequent reads are cached for performance.

D.  

Alter the table implementing liquid clustering by "customer_id" and "purchase_date".

Discussion 0
Questions 8

In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?

Options:

A.  

When the location of the data needs to be changed

B.  

When the target table is an external table

C.  

When the source table can be deleted

D.  

When the target table cannot contain duplicate records

E.  

When the source is not a Delta table

Discussion 0
Questions 9

A data engineer is attempting to write Python and SQL in the same command cell and is running into an error The engineer thought that it was possible to use a Python variable in a select statement.

Why does the command fail?

Options:

A.  

Databricks supports multiple languages but only one per notebook.

B.  

Databricks supports language interoperability in the same cell but only between Scala and SQL

C.  

Databricks supports language interoperability but only if a special character is used.

D.  

Databricks supports one language per cell.

Discussion 0
Questions 10

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

Options:

A.  

grant all privileges on table sales TO team;

B.  

GRANT SELECT ON TABLE sales TO team;

C.  

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

D.  

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Discussion 0
Questions 11

A data engineer has been provided a PySpark DataFrame named df with columns product and revenue. The data engineer needs to compute complex aggregations to determine each product's total revenue, average revenue, and transaction count.

Which code snippet should the data engineer use?

A)

B)

C)

D)

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

Discussion 0
Questions 12

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

Options:

A.  

trigger("5 seconds")

B.  

trigger(continuous="5 seconds")

C.  

trigger(once="5 seconds")

D.  

trigger(processingTime="5 seconds")

Discussion 0
Questions 13

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = ’FRANCE’;

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

Options:

A.  

COMMENT "Contains PIT

B.  

511

C.  

"COMMENT PII"

D.  

TBLPROPERTIES PII

Discussion 0
Questions 14

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

Options:

A.  

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.  

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

C.  

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

D.  

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

Discussion 0
Questions 15

A single Job runs two notebooks as two separate tasks. A data engineer has noticed that one of the notebooks is running slowly in the Job’s current run. The data engineer asks a tech lead for help in identifying why this might be the case.

Which of the following approaches can the tech lead use to identify why the notebook is running slowly as part of the Job?

Options:

A.  

They can navigate to the Runs tab in the Jobs UI to immediately review the processing notebook.

B.  

They can navigate to the Tasks tab in the Jobs UI and click on the active run to review the processing notebook.

C.  

They can navigate to the Runs tab in the Jobs UI and click on the active run to review the processing notebook.

D.  

There is no way to determine why a Job task is running slowly.

E.  

They can navigate to the Tasks tab in the Jobs UI to immediately review the processing notebook.

Discussion 0
Questions 16

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Options:

A.  

Replace predict with a stream-friendly prediction function

B.  

Replace schema(schema) with option ("maxFilesPerTrigger", 1)

C.  

Replace "transactions" with the path to the location of the Delta table

D.  

Replace format("delta") with format("stream")

E.  

Replace spark.read with spark.readStream

Discussion 0
Questions 17

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

A.  

GRANT ALL PRIVILEGES ON TABLE sales TO team;

B.  

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

C.  

GRANT SELECT ON TABLE sales TO team;

D.  

GRANT USAGE ON TABLE sales TO team;

E.  

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Discussion 0
Questions 18

What is the structure of an Asset Bundle?

Options:

A.  

A single plain text file enumerating the names of assets to be migrated to a new workspace.

B.  

A compressed archive (ZIP) that solely contains workspace assets without any accompanying metadata.

C.  

A YAML configuration file that specifies the artifacts, resources, and configurations for the project.

D.  

A Docker image containing runtime environments and the source code of the assets

Discussion 0
Questions 19

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.  

The pipeline can have different notebook sources in SQL & Python.

B.  

The pipeline will need to be written entirely in SQL.

C.  

The pipeline will need to be written entirely in Python.

D.  

The pipeline will need to use a batch source in place of a streaming source.

Discussion 0
Questions 20

Which of the following tools is used by Auto Loader process data incrementally?

Options:

A.  

Checkpointing

B.  

Spark Structured Streaming

C.  

Data Explorer

D.  

Unity Catalog

E.  

Databricks SQL

Discussion 0
Questions 21

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Options:

A.  

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.

B.  

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.

C.  

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.

D.  

They can schedule the query to run every 1 day from the Jobs UI.

E.  

They can schedule the query to run every 12 hours from the Jobs UI.

Discussion 0
Questions 22

An organization is looking for an optimized storage layer that supports ACID transactions and schema enforcement. Which technology should the organization use?

Options:

A.  

Cloud File Storage

B.  

Unity Catalog

C.  

Data lake

D.  

Delta Lake

Discussion 0
Questions 23

A data engineer needs to ingest from both streaming and batch sources for a firm that relies on highly accurate data. Occasionally, some of the data picked up by the sensors that provide a streaming input are outside the expected parameters. If this occurs, the data must be dropped, but the stream should not fail.

Which feature of Delta Live Tables meets this requirement?

Options:

A.  

Monitoring

B.  

Change Data Capture

C.  

Expectations

D.  

Error Handling

Discussion 0
Questions 24

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which location can the data engineer review their permissions on the table?

Options:

A.  

Jobs

B.  

Dashboards

C.  

Catalog Explorer

D.  

Repos

Discussion 0
Questions 25

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id -

FROM STREAM(LIVE.customers)

WHERE loyalty_level = 'high';

Which of the following describes why the STREAM function is included in the query?

Options:

A.  

The STREAM function is not needed and will cause an error.

B.  

The table being created is a live table.

C.  

The customers table is a streaming live table.

D.  

The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.

E.  

The data in the customers table has been updated since its last run.

Discussion 0
Questions 26

A data engineer has left the organization. The data team needs to transfer ownership of the data engineer’s Delta tables to a new data engineer. The new data engineer is the lead engineer on the data team.

Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?

Options:

A.  

Databricks account representative

B.  

This transfer is not possible

C.  

Workspace administrator

D.  

New lead data engineer

E.  

Original data engineer

Discussion 0
Questions 27

A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.

Which of the following describes how a data lakehouse could alleviate this issue?

Options:

A.  

Both teams would autoscale their work as data size evolves

B.  

Both teams would use the same source of truth for their work

C.  

Both teams would reorganize to report to the same department

D.  

Both teams would be able to collaborate on projects in real-time

E.  

Both teams would respond more quickly to ad-hoc requests

Discussion 0
Questions 28

A new data engineering team has been assigned to work on a project. The team will need access to database customers in order to see what tables already exist. The team has its own group team.

Which of the following commands can be used to grant the necessary permission on the entire database to the new team?

Options:

A.  

GRANT VIEW ON CATALOG customers TO team;

B.  

GRANT CREATE ON DATABASE customers TO team;

C.  

GRANT USAGE ON CATALOG team TO customers;

D.  

GRANT CREATE ON DATABASE team TO customers;

E.  

GRANT USAGE ON DATABASE customers TO team;

Discussion 0
Questions 29

A data engineer manages multiple external tables linked to various data sources. The data engineer wants to manage these external tables efficiently and ensure that only the necessary permissions are granted to users for accessing specific external tables.

How should the data engineer manage access to these external tables?

Options:

A.  

Create a single user role with full access to all external tables and assign it to all users.

B.  

Use Unity Catalog to manage access controls and permissions for each external table individually.

C.  

Set up Azure Blob Storage permissions at the container level, allowing access to all external tables.

D.  

Grant permissions on the Databricks workspace level, which will automatically apply to all external tables.

Discussion 0
Questions 30

A data engineer has been given a new record of data:

id STRING = 'a1'

rank INTEGER = 6

rating FLOAT = 9.4

Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?

Options:

A.  

INSERT INTO my_table VALUES ('a1', 6, 9.4)

B.  

my_table UNION VALUES ('a1', 6, 9.4)

C.  

INSERT VALUES ( 'a1' , 6, 9.4) INTO my_table

D.  

UPDATE my_table VALUES ('a1', 6, 9.4)

E.  

UPDATE VALUES ('a1', 6, 9.4) my_table

Discussion 0
Questions 31

Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?

Options:

A.  

None of these

B.  

Data lake

C.  

Data warehouse

D.  

All of these

E.  

Data lakehouse

Discussion 0
Questions 32

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

Options:

A.  

They can set up an Alert with a custom template.

B.  

They can set up an Alert with a new email alert destination.

C.  

They can set up an Alert with one-time notifications.

D.  

They can set up an Alert with a new webhook alert destination.

E.  

They can set up an Alert without notifications.

Discussion 0
Questions 33

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to an ELT job. The ELT job has its Databricks SQL query that returns the number of input records containing unexpected NULL values. The data engineer wants their entire team to be notified via a messaging webhook whenever this value reaches 100.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of NULL values reaches 100?

Options:

A.  

They can set up an Alert with a custom template.

B.  

They can set up an Alert with a new email alert destination.

C.  

They can set up an Alert with a new webhook alert destination.

D.  

They can set up an Alert with one-time notifications.

E.  

They can set up an Alert without notifications.

Discussion 0
Questions 34

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

Options:

A.  

autoloader

B.  

org.apache.spark.sql.jdbc

C.  

sqlite

D.  

org.apache.spark.sql.sqlite

Discussion 0
Questions 35

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

Options:

A.  

Spark SQL Table

B.  

View

C.  

Database

D.  

Temporary view

E.  

Delta Table

Discussion 0
Questions 36

A data engineering project involves processing large batches of data on a daily schedule using ETL. The jobs are resource-intensive and vary in size, requiring a scalable, cost-efficient compute solution that can automatically scale based on the workload.

Which compute approach will satisfy the needs described?

Options:

A.  

Databricks SQL Serverless

B.  

Dedicated Cluster

C.  

All-Purpose Cluster

D.  

Job Cluster

Discussion 0
Questions 37

Which of the following describes the storage organization of a Delta table?

Options:

A.  

Delta tables are stored in a single file that contains data, history, metadata, and other attributes.

B.  

Delta tables store their data in a single file and all metadata in a collection of files in a separate location.

C.  

Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.

D.  

Delta tables are stored in a collection of files that contain only the data stored within the table.

E.  

Delta tables are stored in a single file that contains only the data stored within the table.

Discussion 0
Questions 38

A data engineer needs access to a table new_uable, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which approach can be used to identify the owner of new_table?

Options:

A.  

There is no way to identify the owner of the table

B.  

Review the Owner field in the table's page in the cloud storage solution

C.  

Review the Permissions tab in the table's page in Data Explorer

D.  

Review the Owner field in the table’s page in Data Explorer

Discussion 0
Questions 39

A Databricks single-task workflow fails at the last task due to an error in a notebook. The data engineer fixes the mistake in the notebook. What should the data engineer do to rerun the workflow?

Options:

A.  

Repair the task

B.  

Rerun the pipeline

C.  

Restart the Cluster

D.  

Switch the cluster

Discussion 0
Questions 40

Which of the following describes the relationship between Gold tables and Silver tables?

Options:

A.  

Gold tables are more likely to contain aggregations than Silver tables.

B.  

Gold tables are more likely to contain valuable data than Silver tables.

C.  

Gold tables are more likely to contain a less refined view of data than Silver tables.

D.  

Gold tables are more likely to contain more data than Silver tables.

E.  

Gold tables are more likely to contain truthful data than Silver tables.

Discussion 0
Questions 41

A data engineer is processing ingested streaming tables and needs to filter out NULL values in the order_datetime column from the raw streaming table orders_raw and store the results in a new table orders_valid using DLT.

Which code snippet should the data engineer use?

A)

B)

C)

D)

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

Discussion 0
Questions 42

Which SQL keyword can be used to convert a table from a long format to a wide format?

Options:

A.  

TRANSFORM

B.  

PIVOT

C.  

SUM

D.  

CONVERT

Discussion 0
Questions 43

A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location.

Which of the following data entities should the data engineer create?

Options:

A.  

Database

B.  

Function

C.  

View

D.  

Temporary view

E.  

Table

Discussion 0
Questions 44

A data engineer is designing an ETL pipeline to process both streaming and batch data from multiple sources The pipeline must ensure data quality, handle schema evolution, and provide easy maintenance. The team is considering using Delta Live Tables (DLT) in Databricks to achieve these goals. They want to understand the key features and benefits of DLT that make it suitable for this use case.

Why is Delta Live Tables (DLT) an appropriate choice?

Options:

A.  

Automatic data quality checks, built-in support for schema evolution, and declarative pipeline development

B.  

Manual schema enforcement, high operational overhead, and limited scalability

C.  

Requires custom code for data quality checks, no support for streaming data, and complex pipeline maintenance

D.  

Supports only batch processing, no data versioning, and high infrastructure costs

Discussion 0
Questions 45

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Options:

A.  

A key-value pair configuration

B.  

The preferred DBU/hour cost

C.  

A path to cloud storage location for the written data

D.  

A location of a target database for the written data

E.  

At least one notebook library to be executed

Discussion 0
Questions 46

A data engineer needs to conduct Exploratory Analysis on data residing in a database that is within the company's custom-defined network in the cloud. The data engineer is using SQL for this task.

Which type of SQL Warehouse will enable the data engineer to process large numbers of queries quickly and cost-effectively?

Options:

A.  

Serverless compute for notebooks

B.  

Serverless SQL Warehouse

C.  

Classic SQL Warehouse

D.  

Pro SQL Warehouse

Discussion 0
Questions 47

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.

Which of the following code blocks successfully completes this task?

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

E.  

Option E

Discussion 0