Big Black Friday Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

ExamsBrite Dumps

Databricks Certified Data Engineer Professional Exam Question and Answers

Databricks Certified Data Engineer Professional Exam

Last Update Nov 30, 2025
Total Questions : 195

We are offering FREE Databricks-Certified-Professional-Data-Engineer Databricks exam questions. All you do is to just go and sign up. Give your details, prepare Databricks-Certified-Professional-Data-Engineer free exam questions and then go for complete pool of Databricks Certified Data Engineer Professional Exam test questions that will help you more.

Databricks-Certified-Professional-Data-Engineer pdf

Databricks-Certified-Professional-Data-Engineer PDF

$36.75  $104.99
Databricks-Certified-Professional-Data-Engineer Engine

Databricks-Certified-Professional-Data-Engineer Testing Engine

$43.75  $124.99
Databricks-Certified-Professional-Data-Engineer PDF + Engine

Databricks-Certified-Professional-Data-Engineer PDF + Testing Engine

$57.75  $164.99
Questions 1

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

Options:

A.  

Size on Disk is> 0

B.  

The number of Cached Partitions> the number of Spark Partitions

C.  

The RDD Block Name included the '' annotation signaling failure to cache

D.  

On Heap Memory Usage is within 75% of off Heap Memory usage

Discussion 0
Questions 2

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.

Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

Options:

A.  

• Total VMs; 1

• 400 GB per Executor

• 160 Cores / Executor

B.  

• Total VMs: 8

• 50 GB per Executor

• 20 Cores / Executor

C.  

• Total VMs: 4

• 100 GB per Executor

• 40 Cores/Executor

D.  

• Total VMs:2

• 200 GB per Executor

• 80 Cores / Executor

Discussion 0
Questions 3

Two of the most common data locations on Databricks are the DBFS root storage and external object storage mounted with dbutils.fs.mount().

Which of the following statements is correct?

Options:

A.  

DBFS is a file system protocol that allows users to interact with files stored in object storage using syntax and guarantees similar to Unix file systems.

B.  

By default, both the DBFS root and mounted data sources are only accessible to workspace administrators.

C.  

The DBFS root is the most secure location to store data, because mounted storage volumes must have full public read and write permissions.

D.  

Neither the DBFS root nor mounted storage can be accessed when using %sh in a Databricks notebook.

E.  

The DBFS root stores files in ephemeral block volumes attached to the driver, while mounted directories will always persist saved data to external storage between sessions.

Discussion 0
Questions 4

In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both deep and shallow clone, development tables are created using shallow clone.

A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that vacuum was run the day before.

Why are the cloned tables no longer working?

Options:

A.  

The data files compacted by vacuum are not tracked by the cloned metadata; running refresh on the cloned table will pull in recent changes.

B.  

Because Type 1 changes overwrite existing records, Delta Lake cannot guarantee data consistency for cloned tables.

C.  

The metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command

D.  

Running vacuum automatically invalidates any shallow clones of a table; deep clone should always be used when a cloned table will be repeatedly queried.

Discussion 0
Questions 5

A data engineer wants to create a cluster using the Databricks CLI for a big ETL pipeline. The cluster should have five workers, one driver of type i3.xlarge, and should use the '14.3.x-scala2.12' runtime.

Which command should the data engineer use?

Options:

A.  

databricks clusters create 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name DataEngineer_cluster

B.  

databricks clusters add 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster

C.  

databricks compute add 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster

D.  

databricks compute create 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster

Discussion 0
Questions 6

Which statement describes integration testing?

Options:

A.  

Validates interactions between subsystems of your application

B.  

Requires an automated testing framework

C.  

Requires manual intervention

D.  

Validates an application use case

E.  

Validates behavior of individual elements of your application

Discussion 0
Questions 7

A transactions table has been liquid clustered on the columns product_id, user_id, and event_date.

Which operation lacks support for cluster on write?

Options:

A.  

spark.writestream.format('delta').mode('append')

B.  

CTAS and RTAS statements

C.  

INSERT INTO operations

D.  

spark.write.format('delta').mode('append')

Discussion 0
Questions 8

The data engineering team maintains the following code:

Assuming that this code produces logically correct results and the data in the source table has been de-duplicated and validated, which statement describes what will occur when this code is executed?

Options:

A.  

The silver_customer_sales table will be overwritten by aggregated values calculated from all records in the gold_customer_lifetime_sales_summary table as a batch job.

B.  

A batch job will update the gold_customer_lifetime_sales_summary table, replacing only those rows that have different values than the current version of the table, using customer_id as the primary key.

C.  

The gold_customer_lifetime_sales_summary table will be overwritten by aggregated values calculated from all records in the silver_customer_sales table as a batch job.

D.  

An incremental job will leverage running information in the state store to update aggregate values in the gold_customer_lifetime_sales_summary table.

E.  

An incremental job will detect if new rows have been written to the silver_customer_sales table; if new rows are detected, all aggregates will be recalculated and used to overwrite the gold_customer_lifetime_sales_summary table.

Discussion 0
Questions 9

A Databricks SQL dashboard has been configured to monitor the total number of records present in a collection of Delta Lake tables using the following query pattern:

SELECT COUNT (*) FROM table -

Which of the following describes how results are generated each time the dashboard is updated?

Options:

A.  

The total count of rows is calculated by scanning all data files

B.  

The total count of rows will be returned from cached results unless REFRESH is run

C.  

The total count of records is calculated from the Delta transaction logs

D.  

The total count of records is calculated from the parquet file metadata

E.  

The total count of records is calculated from the Hive metastore

Discussion 0
Questions 10

Review the following error traceback:

Which statement describes the error being raised?

Options:

A.  

The code executed was PvSoark but was executed in a Scala notebook.

B.  

There is no column in the table named heartrateheartrateheartrate

C.  

There is a type error because a column object cannot be multiplied.

D.  

There is a type error because a DataFrame object cannot be multiplied.

E.  

There is a syntax error because the heartrate column is not correctly identified as a column.

Discussion 0
Questions 11

What statement is true regarding the retention of job run history?

Options:

A.  

It is retained until you export or delete job run logs

B.  

It is retained for 30 days, during which time you can deliver job run logs to DBFS or S3

C.  

t is retained for 60 days, during which you can export notebook run results to HTML

D.  

It is retained for 60 days, after which logs are archived

E.  

It is retained for 90 days or until the run-id is re-used through custom run configuration

Discussion 0
Questions 12

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

df has the following schema: device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT

Code block:

df.withWatermark("event_time", "10 minutes")

.groupBy(

________,

"device_id"

)

.agg(

avg("temp").alias("avg_temp"),

avg("humidity").alias("avg_humidity")

)

.writeStream

.format("delta")

.saveAsTable("sensor_avg")

Which line of code correctly fills in the blank within the code block to complete this task?

Options:

A.  

window("event_time", "5 minutes").alias("time")

B.  

to_interval("event_time", "5 minutes").alias("time")

C.  

"event_time"

D.  

lag("event_time", "5 minutes").alias("time")

Discussion 0
Questions 13

Given the following error traceback (from display(df.select(3*"heartrate"))) which shows AnalysisException: cannot resolve 'heartrateheartrateheartrate', which statement describes the error being raised?

Options:

A.  

There is a type error because a DataFrame object cannot be multiplied.

B.  

There is a syntax error because the heartrate column is not correctly identified as a column.

C.  

There is no column in the table named heartrateheartrateheartrate.

D.  

There is a type error because a column object cannot be multiplied.

Discussion 0
Questions 14

The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible.

A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have

Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.

Which statement captures best practices for this situation?

Options:

A.  

Because access to production data will always be verified using passthrough credentials it is safe to mount data to any Databricks development environment.

B.  

All developer, testing and production code and data should exist in a single unified workspace; creating separate environments for testing and development further reduces risks.

C.  

In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.

D.  

Because delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data, as such it is generally safe to mount production data anywhere.

Discussion 0
Questions 15

A Delta Lake table representing metadata about content from user has the following schema:

user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE

Based on the above schema, which column is a good candidate for partitioning the Delta Table?

Options:

A.  

Date

B.  

Post_id

C.  

User_id

D.  

Post_time

Discussion 0
Questions 16

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.

Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales.

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

Options:

A.  

Both commands will succeed. Executing show tables will show that countries at and sales at have been registered as views.

B.  

Cmd 1 will succeed. Cmd 2 will search all accessible databases for a table or view named countries af: if this entity exists, Cmd 2 will succeed.

C.  

Cmd 1 will succeed and Cmd 2 will fail, countries at will be a Python variable representing a PySpark DataFrame.

D.  

Both commands will fail. No new variables, tables, or views will be created.

E.  

Cmd 1 will succeed and Cmd 2 will fail, countries at will be a Python variable containing a list of strings.

Discussion 0
Questions 17

A data engineer has created a new cluster using shared access mode with default configurations. The data engineer needs to allow the development team access to view the driver logs if needed.

What are the minimal cluster permissions that allow the development team to accomplish this?

Options:

A.  

CAN ATTACH TO

B.  

CAN MANAGE

C.  

CAN VIEW

D.  

CAN RESTART

Discussion 0
Questions 18

Which statement describes Delta Lake Auto Compaction?

Options:

A.  

An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 1 GB.

B.  

Before a Jobs cluster terminates, optimize is executed on all tables modified during the most recent job.

C.  

Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.

D.  

Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.

E.  

An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 128 MB.

Discussion 0
Questions 19

An hourly batch job is configured to ingest data files from a cloud object storage container where each batch represent all records produced by the source system in a given hour. The batch job to process these records into the Lakehouse is sufficiently delayed to ensure no late-arriving data is missed. The user_id field represents a unique key for the data, which has the following schema:

user_id BIGINT, username STRING, user_utc STRING, user_region STRING, last_login BIGINT, auto_pay BOOLEAN, last_updated BIGINT

New records are all ingested into a table named account_history which maintains a full record of all data in the same schema as the source. The next table in the system is named account_current and is implemented as a Type 1 table representing the most recent value for each unique user_id.

Assuming there are millions of user accounts and tens of thousands of records processed hourly, which implementation can be used to efficiently update the described account_current table as part of each hourly batch job?

Options:

A.  

Use Auto Loader to subscribe to new files in the account history directory; configure a Structured Streaminq trigger once job to batch update newly detected files into the account current table.

B.  

Overwrite the account current table with each batch using the results of a query against the account history table grouping by user id and filtering for the max value of last updated.

C.  

Filter records in account history using the last updated field and the most recent hour processed, as well as the max last iogin by user id write a merge statement to update or insert the most recent value for each user id.

D.  

Use Delta Lake version history to get the difference between the latest version of account history and one version prior, then write these records to account current.

E.  

Filter records in account history using the last updated field and the most recent hour processed, making sure to deduplicate on username; write a merge statement to update or insert the

most recent value for each username.

Discussion 0
Questions 20

Which is a key benefit of an end-to-end test?

Options:

A.  

It closely simulates real world usage of your application.

B.  

It pinpoint errors in the building blocks of your application.

C.  

It provides testing coverage for all code paths and branches.

D.  

It makes it easier to automate your test suite

Discussion 0
Questions 21

Which distribution does Databricks support for installing custom Python code packages?

Options:

A.  

sbt

B.  

CRAN

C.  

CRAM

D.  

nom

E.  

Wheels

F.  

jars

Discussion 0
Questions 22

Which REST API call can be used to review the notebooks configured to run as tasks in a multi-task job?

Options:

A.  

/jobs/runs/list

B.  

/jobs/runs/get-output

C.  

/jobs/runs/get

D.  

/jobs/get

E.  

/jobs/list

Discussion 0
Questions 23

An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code:

df = spark.read.format("parquet").load(f"/mnt/source/(date)")

Which code block should be used to create the date Python variable used in the above code block?

Options:

A.  

date = spark.conf.get("date")

B.  

input_dict = input()

date= input_dict["date"]

C.  

import sys

date = sys.argv[1]

D.  

date = dbutils.notebooks.getParam("date")

E.  

dbutils.widgets.text("date", "null")

date = dbutils.widgets.get("date")

Discussion 0
Questions 24

A junior data engineer seeks to leverage Delta Lake's Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows in a bronze table created with the property delta.enableChangeDataFeed = true. They plan to execute the following code as a daily job:

Which statement describes the execution and results of running the above query multiple times?

Options:

A.  

Each time the job is executed, newly updated records will be merged into the target table, overwriting previous values with the same primary keys.

B.  

Each time the job is executed, the entire available history of inserted or updated records will be appended to the target table, resulting in many duplicate entries.

C.  

Each time the job is executed, the target table will be overwritten using the entire history of inserted or updated records, giving the desired result.

D.  

Each time the job is executed, the differences between the original and current versions are calculated; this may result in duplicate entries for some records.

E.  

Each time the job is executed, only those records that have been inserted or updated since the last execution will be appended to the target table giving the desired result.

Discussion 0
Questions 25

Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?

Options:

A.  

In the Executor's log file, by gripping for "predicate push-down"

B.  

In the Stage's Detail screen, in the Completed Stages table, by noting the size of data read from the Input column

C.  

In the Storage Detail screen, by noting which RDDs are not stored on disk

D.  

In the Delta Lake transaction log. by noting the column statistics

E.  

In the Query Detail screen, by interpreting the Physical Plan

Discussion 0
Questions 26

A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum.

Which situation is causing increased duration of the overall job?

Options:

A.  

Task queueing resulting from improper thread pool assignment.

B.  

Spill resulting from attached volume storage being too small.

C.  

Network latency due to some cluster nodes being in different regions from the source data

D.  

Skew caused by more data being assigned to a subset of spark-partitions.

E.  

Credential validation errors while pulling data from an external system.

Discussion 0
Questions 27

A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constrains and multi-table inserts to validate records on write.

Which consideration will impact the decisions made by the engineer while migrating this workload?

Options:

A.  

All Delta Lake transactions are ACID compliance against a single table, and Databricks does not enforce foreign key constraints.

B.  

Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.

C.  

Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake's upsert functionality.

D.  

Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.

Discussion 0
Questions 28

The data governance team is reviewing code used for deleting records for compliance with GDPR. They note the following logic is used to delete records from the Delta Lake table named users.

Assuming that user_id is a unique identifying key and that delete_requests contains all users that have requested deletion, which statement describes whether successfully executing the above logic guarantees that the records to be deleted are no longer accessible and why?

Options:

A.  

Yes; Delta Lake ACID guarantees provide assurance that the delete command succeeded fully and permanently purged these records.

B.  

No; the Delta cache may return records from previous versions of the table until the cluster is restarted.

C.  

Yes; the Delta cache immediately updates to reflect the latest data files recorded to disk.

D.  

No; the Delta Lake delete command only provides ACID guarantees when combined with the merge into command.

E.  

No; files containing deleted records may still be accessible with time travel until a vacuum command is used to remove invalidated data files.

Discussion 0
Questions 29

A Delta Lake table representing metadata about content posts from users has the following schema:

    user_id LONG

    post_text STRING

    post_id STRING

    longitude FLOAT

    latitude FLOAT

    post_time TIMESTAMP

    date DATE

Based on the above schema, which column is a good candidate for partitioning the Delta Table?

Options:

A.  

date

B.  

user_id

C.  

post_id

D.  

post_time

Discussion 0
Questions 30

The data architect has decided that once data has been ingested from external sources into the

Databricks Lakehouse, table access controls will be leveraged to manage permissions for all production tables and views.

The following logic was executed to grant privileges for interactive queries on a production database to the core engineering group.

GRANT USAGE ON DATABASE prod TO eng;

GRANT SELECT ON DATABASE prod TO eng;

Assuming these are the only privileges that have been granted to the eng group and that these users are not workspace administrators, which statement describes their privileges?

Options:

A.  

Group members have full permissions on the prod database and can also assign permissions to other users or groups.

B.  

Group members are able to list all tables in the prod database but are not able to see the results of any queries on those tables.

C.  

Group members are able to query and modify all tables and views in the prod database, but cannot create new tables or views.

D.  

Group members are able to query all tables and views in the prod database, but cannot create or edit anything in the database.

E.  

Group members are able to create, query, and modify all tables and views in the prod database, but cannot define custom functions.

Discussion 0
Questions 31

In order to facilitate near real-time workloads, a data engineer is creating a helper function to leverage the schema detection and evolution functionality of Databricks Auto Loader. The desired function will automatically detect the schema of the source directly, incrementally process JSON files as they arrive in a source directory, and automatically evolve the schema of the table when new fields are detected.

The function is displayed below with a blank:

Which response correctly fills in the blank to meet the specified requirements?

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

E.  

Option E

Discussion 0
Questions 32

A Delta Lake table representing metadata about content posts from users has the following schema:

user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE

This table is partitioned by the date column. A query is run with the following filter:

longitude < 20 & longitude > -20

Which statement describes how data will be filtered?

Options:

A.  

Statistics in the Delta Log will be used to identify partitions that might Include files in the filtered range.

B.  

No file skipping will occur because the optimizer does not know the relationship between the partition column and the longitude.

C.  

The Delta Engine will use row-level statistics in the transaction log to identify the flies that meet the filter criteria.

D.  

Statistics in the Delta Log will be used to identify data files that might include records in the filtered range.

E.  

The Delta Engine will scan the parquet file footers to identify each row that meets the filter criteria.

Discussion 0
Questions 33

The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.

The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series of VACUUM commands on all Delta Lake tables throughout the organization.

The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.

Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?

Options:

A.  

Because the vacuum command permanently deletes all files containing deleted records, deleted records may be accessible with time travel for around 24 hours.

B.  

Because the default data retention threshold is 24 hours, data files containing deleted records will be retained until the vacuum job is run the following day.

C.  

Because Delta Lake time travel provides full access to the entire history of a table, deleted records can always be recreated by users with full admin privileges.

D.  

Because Delta Lake's delete statements have ACID guarantees, deleted records will be permanently purged from all storage systems as soon as a delete job completes.

E.  

Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the vacuum job is run 8 days later.

Discussion 0
Questions 34

A platform engineer is creating catalogs and schemas for the development team to use.

The engineer has created an initial catalog, catalog_A, and initial schema, schema_A. The engineer has also granted USE CATALOG, USE

SCHEMA, and CREATE TABLE to the development team so that the engineer can begin populating the schema with new tables.

Despite being owner of the catalog and schema, the engineer noticed that they do not have access to the underlying tables in Schema_A.

What explains the engineer's lack of access to the underlying tables?

Options:

A.  

The platform engineer needs to execute a REFRESH statement as the table permissions did not automatically update for owners.

B.  

Users granted with USE CATALOG can modify the owner's permissions to downstream tables.

C.  

The owner of the schema does not automatically have permission to tables within the schema, but can grant them to themselves at any point.

D.  

Permissions explicitly given by the table creator are the only way the Platform Engineer could access the underlying tables in their

schema.

Discussion 0
Questions 35

The data engineering team maintains the following code:

Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and validated, which statement describes what will occur when this code is executed?

Options:

A.  

A batch job will update the enriched_itemized_orders_by_account table, replacing only those rows that have different values than the current version of the table, using accountID as the primary key.

B.  

The enriched_itemized_orders_by_account table will be overwritten using the current valid version of data in each of the three tables referenced in the join logic.

C.  

An incremental job will leverage information in the state store to identify unjoined rows in the source tables and write these rows to the enriched_iteinized_orders_by_account table.

D.  

An incremental job will detect if new rows have been written to any of the source tables; if new rows are detected, all results will be recalculated and used to overwrite the enriched_itemized_orders_by_account table.

E.  

No computation will occur until enriched_itemized_orders_by_account is queried; upon query materialization, results will be calculated using the current valid version of data in each of the three tables referenced in the join logic.

Discussion 0
Questions 36

A team of data engineer are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks.

One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

Options:

A.  

Maintain data quality rules in a Delta table outside of this pipeline’s target schema, providing the schema name as a pipeline parameter.

B.  

Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.

C.  

Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.

D.  

Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.

Discussion 0
Questions 37

A table named user_ltv is being used to create a view that will be used by data analysis on various teams. Users in the workspace are configured into groups, which are used for setting up data access using ACLs.

The user_ltv table has the following schema:

An analyze who is not a member of the auditing group executing the following query:

Which result will be returned by this query?

Options:

A.  

All columns will be displayed normally for those records that have an age greater than 18; records not meeting this condition will be omitted.

B.  

All columns will be displayed normally for those records that have an age greater than 17; records not meeting this condition will be omitted.

C.  

All age values less than 18 will be returned as null values all other columns will be returned with the values in user_ltv.

D.  

All records from all columns will be displayed with the values in user_ltv.

Discussion 0
Questions 38

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

Which statement describes this implementation?

Options:

A.  

The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.

B.  

The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

C.  

The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

D.  

The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

E.  

The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

Discussion 0
Questions 39

The business intelligence team has a dashboard configured to track various summary metrics for retail stories. This includes total sales for the previous day alongside totals and averages for a variety of time periods. The fields required to populate this dashboard have the following schema:

For Demand forecasting, the Lakehouse contains a validated table of all itemized sales updated incrementally in near real-time. This table named products_per_order, includes the following fields:

Because reporting on long-term sales trends is less volatile, analysts using the new dashboard only require data to be refreshed once daily. Because the dashboard will be queried interactively by many users throughout a normal business day, it should return results quickly and reduce total compute associated with each materialization.

Which solution meets the expectations of the end users while controlling and limiting possible costs?

Options:

A.  

Use the Delta Cache to persists the products_per_order table in memory to quickly the dashboard with each query.

B.  

Populate the dashboard by configuring a nightly batch job to save the required to quickly update the dashboard with each query.

C.  

Use Structure Streaming to configure a live dashboard against the products_per_order table within a Databricks notebook.

D.  

Define a view against the products_per_order table and define the dashboard against this view.

Discussion 0
Questions 40

A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:

A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67.

Which statement describes the outcome of this batch insert?

Options:

A.  

The write will fail when the violating record is reached; any records previously processed will be recorded to the target table.

B.  

The write will fail completely because of the constraint violation and no records will be inserted into the target table.

C.  

The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.

D.  

The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.

E.  

The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.

Discussion 0