Labour Day Special 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

Databricks Certified Data Engineer Associate Exam Question and Answers

Databricks Certified Data Engineer Associate Exam

Last Update Apr 25, 2024
Total Questions : 90

We are offering FREE Databricks-Certified-Data-Engineer-Associate Databricks exam questions. All you do is to just go and sign up. Give your details, prepare Databricks-Certified-Data-Engineer-Associate free exam questions and then go for complete pool of Databricks Certified Data Engineer Associate Exam test questions that will help you more.

Databricks-Certified-Data-Engineer-Associate pdf

Databricks-Certified-Data-Engineer-Associate PDF

$35  $99.99
Databricks-Certified-Data-Engineer-Associate Engine

Databricks-Certified-Data-Engineer-Associate Testing Engine

$42  $119.99
Databricks-Certified-Data-Engineer-Associate PDF + Engine

Databricks-Certified-Data-Engineer-Associate PDF + Testing Engine

$56  $159.99
Questions 1

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id -

FROM STREAM(LIVE.customers)

WHERE loyalty_level = 'high';

Which of the following describes why the STREAM function is included in the query?

Options:

A.  

The STREAM function is not needed and will cause an error.

B.  

The table being created is a live table.

C.  

The customers table is a streaming live table.

D.  

The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.

E.  

The data in the customers table has been updated since its last run.

Discussion 0
Questions 2

A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.

Which of the following explains why the data files are no longer present?

Options:

A.  

The VACUUM command was run on the table

B.  

The TIME TRAVEL command was run on the table

C.  

The DELETE HISTORY command was run on the table

D.  

The OPTIMIZE command was nun on the table

E.  

The HISTORY command was run on the table

Discussion 0
Questions 3

A data architect has determined that a table of the following format is necessary:

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

E.  

Option E

Discussion 0
Questions 4

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.  

A data lakehouse provides storage solutions for structured and unstructured data.

B.  

A data lakehouse supports ACID-compliant transactions.

C.  

A data lakehouse allows the use of SQL queries to examine data.

D.  

A data lakehouse stores data in open formats.

E.  

A data lakehouse enables machine learning and artificial Intelligence workloads.

Discussion 0
Questions 5

A new data engineering team team. has been assigned to an ELT project. The new data engineering team will need full privileges on the database customers to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

A.  

GRANT USAGE ON DATABASE customers TO team;

B.  

GRANT ALL PRIVILEGES ON DATABASE team TO customers;

C.  

GRANT SELECT PRIVILEGES ON DATABASE customers TO teams;

D.  

GRANT SELECT CREATE MODIFY USAGE PRIVILEGES ON DATABASE customers TO team;

E.  

GRANT ALL PRIVILEGES ON DATABASE customers TO team;

Discussion 0
Questions 6

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?

Options:

A.  

Manually programming in an alert system in each cell of the Notebook

B.  

Setting up an Alert in the Job page

C.  

Setting up an Alert in the Notebook

D.  

There is no way to notify the Job owner in the case of Job failure

E.  

MLflow Model Registry Webhooks

Discussion 0
Questions 7

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

Options:

A.  

SELECT * FROM sales

B.  

spark.delta.table

C.  

spark.sql

D.  

There is no way to share data between PySpark and SQL.

E.  

spark.table

Discussion 0
Questions 8

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

Options:

A.  

PIVOT

B.  

CONVERT

C.  

WHERE

D.  

TRANSFORM

E.  

SUM

Discussion 0
Questions 9

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.  

None of these changes will need to be made

B.  

The pipeline will need to stop using the medallion-based multi-hop architecture

C.  

The pipeline will need to be written entirely in SQL

D.  

The pipeline will need to use a batch source in place of a streaming source

E.  

The pipeline will need to be written entirely in Python

Discussion 0
Questions 10

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

Options:

A.  

It is not possible to use SQL in a Python notebook

B.  

They can attach the cell to a SQL endpoint rather than a Databricks cluster

C.  

They can simply write SQL syntax in the cell

D.  

They can add %sql to the first line of the cell

E.  

They can change the default language of the notebook to SQL

Discussion 0
Questions 11

A new data engineering team has been assigned to work on a project. The team will need access to database customers in order to see what tables already exist. The team has its own group team.

Which of the following commands can be used to grant the necessary permission on the entire database to the new team?

Options:

A.  

GRANT VIEW ON CATALOG customers TO team;

B.  

GRANT CREATE ON DATABASE customers TO team;

C.  

GRANT USAGE ON CATALOG team TO customers;

D.  

GRANT CREATE ON DATABASE team TO customers;

E.  

GRANT USAGE ON DATABASE customers TO team;

Discussion 0
Questions 12

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which of the following locations can the data engineer review their permissions on the table?

Options:

A.  

Databricks Filesystem

B.  

Jobs

C.  

Dashboards

D.  

Repos

E.  

Data Explorer

Discussion 0
Questions 13

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

Options:

A.  

Unity Catalog

B.  

Delta Lake

C.  

Databricks SQL

D.  

Data Explorer

E.  

Auto Loader

Discussion 0
Questions 14

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

A.  

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

B.  

Records that violate the expectation cause the job to fail.

C.  

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

D.  

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

E.  

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Discussion 0
Questions 15

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Options:

A.  

Replace predict with a stream-friendly prediction function

B.  

Replace schema(schema) with option ("maxFilesPerTrigger", 1)

C.  

Replace "transactions" with the path to the location of the Delta table

D.  

Replace format("delta") with format("stream")

E.  

Replace spark.read with spark.readStream

Discussion 0
Questions 16

A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:

DROP TABLE IF EXISTS my_table;

After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.

Which of the following describes why all of these files were deleted?

Options:

A.  

The table was managed

B.  

The table's data was smaller than 10 GB

C.  

The table's data was larger than 10 GB

D.  

The table was external

E.  

The table did not have a location

Discussion 0
Questions 17

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?

Options:

A.  

Cloud-specific integrations

B.  

Simplified governance

C.  

Ability to scale storage

D.  

Ability to scale workloads

E.  

Avoiding vendor lock-in

Discussion 0
Questions 18

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

Options:

A.  

Unity Catalog

B.  

Data Explorer

C.  

Delta Lake

D.  

Delta Live Tables

E.  

Auto Loader

Discussion 0
Questions 19

A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True.

Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?

Options:

A.  

if day_of_week = 1 and review_period:

B.  

if day_of_week = 1 and review_period = "True":

C.  

if day_of_week == 1 and review_period == "True":

D.  

if day_of_week == 1 and review_period:

E.  

if day_of_week = 1 & review_period: = "True":

Discussion 0
Questions 20

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

A.  

They could submit a feature request with Databricks to add this functionality.

B.  

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

C.  

They could only run the entire program on Sundays.

D.  

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

E.  

They could redesign the data model to separate the data used in the final query into a new table.

Discussion 0
Questions 21

Which of the following commands will return the location of database customer360?

Options:

A.  

DESCRIBE LOCATION customer360;

B.  

DROP DATABASE customer360;

C.  

DESCRIBE DATABASE customer360;

D.  

ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};

E.  

USE DATABASE customer360;

Discussion 0
Questions 22

Which of the following benefits is provided by the array functions from Spark SQL?

Options:

A.  

An ability to work with data in a variety of types at once

B.  

An ability to work with data within certain partitions and windows

C.  

An ability to work with time-related data in specified intervals

D.  

An ability to work with complex, nested data ingested from JSON files

E.  

An ability to work with an array of tables for procedural automation

Discussion 0
Questions 23

Which file format is used for storing Delta Lake Table?

Options:

A.  

Parquet

B.  

Delta

C.  

SV

D.  

JSON

Discussion 0
Questions 24

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

Options:

A.  

pyspark.sql.types.DateType

B.  

datetime

C.  

pyspark.sql.types.TimestampType

D.  

Cron syntax

E.  

There is no way to represent and submit this information programmatically

Discussion 0
Questions 25

A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously.They ask the data engineering team for help. The data engineering team notices that each of the team’s queries uses the same SQL endpoint.

Which of the following approaches can the data engineering team use to improve the latency of the team’s queries?

Options:

A.  

They can increase the cluster size of the SQL endpoint.

B.  

They can increase the maximum bound of the SQL endpoint’s scaling range.

C.  

They can turn on the Auto Stop feature for the SQL endpoint.

D.  

They can turn on the Serverless feature for the SQL endpoint.

E.  

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to “Reliability Optimized.”

Discussion 0
Questions 26

A data engineer has been given a new record of data:

id STRING = 'a1'

rank INTEGER = 6

rating FLOAT = 9.4

Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?

Options:

A.  

INSERT INTO my_table VALUES ('a1', 6, 9.4)

B.  

my_table UNION VALUES ('a1', 6, 9.4)

C.  

INSERT VALUES ( 'a1' , 6, 9.4) INTO my_table

D.  

UPDATE my_table VALUES ('a1', 6, 9.4)

E.  

UPDATE VALUES ('a1', 6, 9.4) my_table

Discussion 0
Questions 27

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Options:

A.  

Review the Permissions tab in the table's page in Data Explorer

B.  

All of these options can be used to identify the owner of the table

C.  

Review the Owner field in the table's page in Data Explorer

D.  

Review the Owner field in the table's page in the cloud storage solution

E.  

There is no way to identify the owner of the table

Discussion 0