Big Black Friday Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

ExamsBrite Dumps

Databricks Certified Data Engineer Associate Exam Question and Answers

Databricks Certified Data Engineer Associate Exam

Last Update Nov 30, 2025
Total Questions : 153

We are offering FREE Databricks-Certified-Data-Engineer-Associate Databricks exam questions. All you do is to just go and sign up. Give your details, prepare Databricks-Certified-Data-Engineer-Associate free exam questions and then go for complete pool of Databricks Certified Data Engineer Associate Exam test questions that will help you more.

Databricks-Certified-Data-Engineer-Associate pdf

Databricks-Certified-Data-Engineer-Associate PDF

$36.75  $104.99
Databricks-Certified-Data-Engineer-Associate Engine

Databricks-Certified-Data-Engineer-Associate Testing Engine

$43.75  $124.99
Databricks-Certified-Data-Engineer-Associate PDF + Engine

Databricks-Certified-Data-Engineer-Associate PDF + Testing Engine

$57.75  $164.99
Questions 1

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

Options:

A.  

trigger("5 seconds")

B.  

trigger(continuous="5 seconds")

C.  

trigger(once="5 seconds")

D.  

trigger(processingTime="5 seconds")

Discussion 0
Questions 2

A data engineer needs to optimize the data layout and query performance for an e-commerce transactions Delta table. The table is partitioned by "purchase_date" a date column which helps with time-based queries but does not optimize searches on user statistics "customer_id", a high-cardinality column.

The table is usually queried with filters on "customer_i

d" within specific date ranges, but since this data is spread across multiple files in each partition, it results in full partition scans and increased runtime and costs.

How should the data engineer optimize the Data Layout for efficient reads?

Options:

A.  

Alter table implementing liquid clustering on "customerid" while keeping the existing partitioning.

B.  

Alter the table to partition by "customer_id".

C.  

Enable delta caching on the cluster so that frequent reads are cached for performance.

D.  

Alter the table implementing liquid clustering by "customer_id" and "purchase_date".

Discussion 0
Questions 3

A global retail company sells products across multiple categories (e.g.. Electronics, Clothing) and regions (e.g.. North. South, East. West). The sales team has provided the data engineer with a PySpark dataframe named sales_df as below and the team wants the data engineer to analyze the sales data to help them make strategic decisions.

Options:

A.  

Category_sales = sales df.groupBy("category").agg(sum("sales amount") .alias ("total sales amount"))

B.  

Category_sales = sales_df.sum("3ales_amount"). g-1- upBy("categcryn).alias("toLal_sales_amount))

C.  

Category_sale: .es df -agg (sum ("sales amount") .-;r*i:rRy ("category") .alias ("total sa.en amount"))

D.  

Category_sales = sales_df.groupBy("reqion"). agq(sum("sales_amountn).alias(ntotal_sales_amount''))

Discussion 0
Questions 4

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.  

A data lakehouse provides storage solutions for structured and unstructured data.

B.  

A data lakehouse supports ACID-compliant transactions.

C.  

A data lakehouse allows the use of SQL queries to examine data.

D.  

A data lakehouse stores data in open formats.

E.  

A data lakehouse enables machine learning and artificial Intelligence workloads.

Discussion 0
Questions 5

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

A.  

Records that violate the expectation cause the job to fail.

B.  

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

C.  

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

D.  

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Discussion 0
Questions 6

Which of the following commands will return the location of database customer360?

Options:

A.  

DESCRIBE LOCATION customer360;

B.  

DROP DATABASE customer360;

C.  

DESCRIBE DATABASE customer360;

D.  

ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};

E.  

USE DATABASE customer360;

Discussion 0
Questions 7

A data engineer has written a function in a Databricks Notebook to calculate the population of bacteria in a given medium.

Analysts use this function in the notebook and sometimes provide input arguments of the wrong data type, which can cause errors during execution.

Which Databricks feature will help the data engineer quickly identify if an incorrect data type has been provided as input?

Options:

A.  

The Data Engineer should add print statements to find out what the variable is.

B.  

The Databricks debugger enables breakpoints that will raise an error if the wrong data type is submitted

C.  

The Spark User interface has a debug tab that contains the variables that are used in this session.

D.  

The Databricks debugger enables the use of a variable explorer to see at a glance the value of the variables.

Discussion 0
Questions 8

An organization plans to share a large dataset stored in a Databricks workspace on AWS with a partner organization whose Databricks workspace is hosted on Azure. The data engineer wants to minimize data transfer costs while ensuring secure and efficient data sharing.

Which strategy will reduce data egress costs associated with cross-cloud data sharing?

Options:

A.  

Sharing data via pre-signed URLs without monitoring egress costs

B.  

Migrating the dataset to Cloudflare R2 object storage before sharing

C.  

Configure VPN connection between AWS and Azure for faster data sharing

D.  

Using Delta Sharing without any additional configurations

Discussion 0
Questions 9

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

Options:

A.  

PIVOT

B.  

CONVERT

C.  

WHERE

D.  

TRANSFORM

E.  

SUM

Discussion 0
Questions 10

Which of the following describes the relationship between Bronze tables and raw data?

Options:

A.  

Bronze tables contain less data than raw data files.

B.  

Bronze tables contain more truthful data than raw data.

C.  

Bronze tables contain aggregates while raw data is unaggregated.

D.  

Bronze tables contain a less refined view of data than raw data.

E.  

Bronze tables contain raw data with a schema applied.

Discussion 0
Questions 11

A new data engineering team team. has been assigned to an ELT project. The new data engineering team will need full privileges on the database customers to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

A.  

GRANT USAGE ON DATABASE customers TO team;

B.  

GRANT ALL PRIVILEGES ON DATABASE team TO customers;

C.  

GRANT SELECT PRIVILEGES ON DATABASE customers TO teams;

D.  

GRANT SELECT CREATE MODIFY USAGE PRIVILEGES ON DATABASE customers TO team;

E.  

GRANT ALL PRIVILEGES ON DATABASE customers TO team;

Discussion 0
Questions 12

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Options:

A.  

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

B.  

They can turn on the Auto Stop feature for the SQL endpoint.

C.  

They can increase the cluster size of the SQL endpoint.

D.  

They can turn on the Serverless feature for the SQL endpoint.

E.  

They can increase the maximum bound of the SQL endpoint's scaling range

Discussion 0
Questions 13

An organization is looking for an optimized storage layer that supports ACID transactions and schema enforcement. Which technology should the organization use?

Options:

A.  

Cloud File Storage

B.  

Unity Catalog

C.  

Data lake

D.  

Delta Lake

Discussion 0
Questions 14

A new data engineering team has been assigned to work on a project. The team will need access to database customers in order to see what tables already exist. The team has its own group team.

Which of the following commands can be used to grant the necessary permission on the entire database to the new team?

Options:

A.  

GRANT VIEW ON CATALOG customers TO team;

B.  

GRANT CREATE ON DATABASE customers TO team;

C.  

GRANT USAGE ON CATALOG team TO customers;

D.  

GRANT CREATE ON DATABASE team TO customers;

E.  

GRANT USAGE ON DATABASE customers TO team;

Discussion 0
Questions 15

Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?

Options:

A.  

None of these

B.  

Data lake

C.  

Data warehouse

D.  

All of these

E.  

Data lakehouse

Discussion 0
Questions 16

A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.

Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?

Options:

A.  

They can use endpoints available in Databricks SQL

B.  

They can use jobs clusters instead of all-purpose clusters

C.  

They can configure the clusters to be single-node

D.  

They can use clusters that are from a cluster pool

E.  

They can configure the clusters to autoscale for larger data sizes

Discussion 0
Questions 17

Which of the following Git operations must be performed outside of Databricks Repos?

Options:

A.  

Commit

B.  

Pull

C.  

Push

D.  

Clone

E.  

Merge

Discussion 0
Questions 18

Which method should a Data Engineer apply to ensure Workflows are being triggered on schedule?

Options:

A.  

Scheduled Workflows require an always-running cluster, which is more expensive but reduces processing latency.

B.  

Scheduled Workflows process data as it arrives at configured sources.

C.  

Scheduled Workflows can reduce resource consumption and expense since the cluster runs only long enough to execute the pipeline.

D.  

Scheduled Workflows run continuously until manually stopped.

Discussion 0
Questions 19

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Options:

A.  

Replace predict with a stream-friendly prediction function

B.  

Replace schema(schema) with option ("maxFilesPerTrigger", 1)

C.  

Replace "transactions" with the path to the location of the Delta table

D.  

Replace format("delta") with format("stream")

E.  

Replace spark.read with spark.readStream

Discussion 0
Questions 20

A data engineer is attempting to write Python and SQL in the same command cell and is running into an error The engineer thought that it was possible to use a Python variable in a select statement.

Why does the command fail?

Options:

A.  

Databricks supports multiple languages but only one per notebook.

B.  

Databricks supports language interoperability in the same cell but only between Scala and SQL

C.  

Databricks supports language interoperability but only if a special character is used.

D.  

Databricks supports one language per cell.

Discussion 0
Questions 21

What is the structure of an Asset Bundle?

Options:

A.  

A single plain text file enumerating the names of assets to be migrated to a new workspace.

B.  

A compressed archive (ZIP) that solely contains workspace assets without any accompanying metadata.

C.  

A YAML configuration file that specifies the artifacts, resources, and configurations for the project.

D.  

A Docker image containing runtime environments and the source code of the assets

Discussion 0
Questions 22

A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have been made and synced to the central Git repository. The data engineer now needs to sync their Databricks Repo to get the changes from the central Git repository.

Which of the following Git operations does the data engineer need to run to accomplish this task?

Options:

A.  

Merge

B.  

Push

C.  

Pull

D.  

Commit

E.  

Clone

Discussion 0
Questions 23

Identify how the count_if function and the count where x is null can be used

Consider a table random_values with below data.

What would be the output of below query?

select count_if(col > 1) as count_a. count(*) as count_b.count(col1) as count_c from random_values col1

0

1

2

NULL -

2

3

Options:

A.  

3 6 5

B.  

4 6 5

C.  

3 6 6

D.  

4 6 6

Discussion 0
Questions 24

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

A.  

GRANT ALL PRIVILEGES ON TABLE sales TO team;

B.  

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

C.  

GRANT SELECT ON TABLE sales TO team;

D.  

GRANT USAGE ON TABLE sales TO team;

E.  

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Discussion 0
Questions 25

Which of the following tools is used by Auto Loader process data incrementally?

Options:

A.  

Checkpointing

B.  

Spark Structured Streaming

C.  

Data Explorer

D.  

Unity Catalog

E.  

Databricks SQL

Discussion 0
Questions 26

Which of the following data workloads will utilize a Gold table as its source?

Options:

A.  

A job that enriches data by parsing its timestamps into a human-readable format

B.  

A job that aggregates uncleaned data to create standard summary statistics

C.  

A job that cleans data by removing malformatted records

D.  

A job that queries aggregated data designed to feed into a dashboard

E.  

A job that ingests raw data from a streaming source into the Lakehouse

Discussion 0
Questions 27

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Options:

A.  

When another task needs to be replaced by the new task

B.  

When another task needs to fail before the new task begins

C.  

When another task has the same dependency libraries as the new task

D.  

When another task needs to use as little compute resources as possible

E.  

When another task needs to successfully complete before the new task begins

Discussion 0
Questions 28

A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv.

They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

A.  

None of these lines of code are needed to successfully complete the task

B.  

USING CSV

C.  

FROM CSV

D.  

USING DELTA

E.  

FROM "path/to/csv"

Discussion 0
Questions 29

A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.

Which of the following describes how a data lakehouse could alleviate this issue?

Options:

A.  

Both teams would autoscale their work as data size evolves

B.  

Both teams would use the same source of truth for their work

C.  

Both teams would reorganize to report to the same department

D.  

Both teams would be able to collaborate on projects in real-time

E.  

Both teams would respond more quickly to ad-hoc requests

Discussion 0
Questions 30

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

The data engineer runs the following query to join these tables together:

Which of the following will be returned by the above query?

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

E.  

Option E

Discussion 0
Questions 31

A data engineer needs to conduct Exploratory Analysis on data residing in a database that is within the company's custom-defined network in the cloud. The data engineer is using SQL for this task.

Which type of SQL Warehouse will enable the data engineer to process large numbers of queries quickly and cost-effectively?

Options:

A.  

Serverless compute for notebooks

B.  

Serverless SQL Warehouse

C.  

Classic SQL Warehouse

D.  

Pro SQL Warehouse

Discussion 0
Questions 32

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.  

processingTime(1)

B.  

trigger(availableNow=True)

C.  

trigger(parallelBatch=True)

D.  

trigger(processingTime="once")

E.  

trigger(continuous="once")

Discussion 0
Questions 33

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

Options:

A.  

Checkpointing and Write-ahead Logs

B.  

Structured Streaming cannot record the offset range of the data being processed in each trigger.

C.  

Replayable Sources and Idempotent Sinks

D.  

Write-ahead Logs and Idempotent Sinks

E.  

Checkpointing and Idempotent Sinks

Discussion 0
Questions 34

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.  

trigger("5 seconds")

B.  

trigger()

C.  

trigger(once="5 seconds")

D.  

trigger(processingTime="5 seconds")

E.  

trigger(continuous="5 seconds")

Discussion 0
Questions 35

Which of the following describes the relationship between Gold tables and Silver tables?

Options:

A.  

Gold tables are more likely to contain aggregations than Silver tables.

B.  

Gold tables are more likely to contain valuable data than Silver tables.

C.  

Gold tables are more likely to contain a less refined view of data than Silver tables.

D.  

Gold tables are more likely to contain more data than Silver tables.

E.  

Gold tables are more likely to contain truthful data than Silver tables.

Discussion 0
Questions 36

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = ’FRANCE’;

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

Options:

A.  

COMMENT "Contains PIT

B.  

511

C.  

"COMMENT PII"

D.  

TBLPROPERTIES PII

Discussion 0
Questions 37

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Options:

A.  

Review the Permissions tab in the table's page in Data Explorer

B.  

All of these options can be used to identify the owner of the table

C.  

Review the Owner field in the table's page in Data Explorer

D.  

Review the Owner field in the table's page in the cloud storage solution

E.  

There is no way to identify the owner of the table

Discussion 0
Questions 38

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Options:

A.  

Worker node

B.  

JDBC data source

C.  

Databricks web application

D.  

Databricks Filesystem

E.  

Driver node

Discussion 0
Questions 39

A data engineer is designing an ETL pipeline to process both streaming and batch data from multiple sources The pipeline must ensure data quality, handle schema evolution, and provide easy maintenance. The team is considering using Delta Live Tables (DLT) in Databricks to achieve these goals. They want to understand the key features and benefits of DLT that make it suitable for this use case.

Why is Delta Live Tables (DLT) an appropriate choice?

Options:

A.  

Automatic data quality checks, built-in support for schema evolution, and declarative pipeline development

B.  

Manual schema enforcement, high operational overhead, and limited scalability

C.  

Requires custom code for data quality checks, no support for streaming data, and complex pipeline maintenance

D.  

Supports only batch processing, no data versioning, and high infrastructure costs

Discussion 0
Questions 40

Which of the following describes a scenario in which a data engineer will want to use a single-node cluster?

Options:

A.  

When they are working interactively with a small amount of data

B.  

When they are running automated reports to be refreshed as quickly as possible

C.  

When they are working with SQL within Databricks SQL

D.  

When they are concerned about the ability to automatically scale with larger data

E.  

When they are manually running reports with a large amount of data

Discussion 0
Questions 41

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which of the following code blocks can the data engineer use to complete this task?

A)

B)

C)

D)

E)

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

E.  

Option E

Discussion 0
Questions 42

A data engineer needs to ingest from both streaming and batch sources for a firm that relies on highly accurate data. Occasionally, some of the data picked up by the sensors that provide a streaming input are outside the expected parameters. If this occurs, the data must be dropped, but the stream should not fail.

Which feature of Delta Live Tables meets this requirement?

Options:

A.  

Monitoring

B.  

Change Data Capture

C.  

Expectations

D.  

Error Handling

Discussion 0
Questions 43

Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?

Options:

A.  

Silver tables contain a less refined, less clean view of data than Bronze data.

B.  

Silver tables contain aggregates while Bronze data is unaggregated.

C.  

Silver tables contain more data than Bronze tables.

D.  

Silver tables contain a more refined and cleaner view of data than Bronze tables.

E.  

Silver tables contain less data than Bronze tables.

Discussion 0
Questions 44

In which of the following file formats is data from Delta Lake tables primarily stored?

Options:

A.  

Delta

B.  

CSV

C.  

Parquet

D.  

JSON

E.  

A proprietary, optimized format specific to Databricks

Discussion 0
Questions 45

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

A.  

They could submit a feature request with Databricks to add this functionality.

B.  

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

C.  

They could only run the entire program on Sundays.

D.  

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

E.  

They could redesign the data model to separate the data used in the final query into a new table.

Discussion 0