Databricks Certified Associate Developer for Apache Spark 3.0 Exam Question and Answers

Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update Apr 28, 2024
Total Questions : 180

We are offering FREE Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks exam questions. All you do is to just go and sign up. Give your details, prepare Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 free exam questions and then go for complete pool of Databricks Certified Associate Developer for Apache Spark 3.0 Exam test questions that will help you more.

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

$35 ~~$99.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

$42 ~~$119.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

$56 ~~$159.99~~

Add to Cart

Questions 1

The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header

and casting the columns in the most appropriate type. Find the error.

First 3 rows of transactions.csv:

1.transactionId;storeId;productId;name

2.1;23;12;green grass

3.2;35;31;yellow sun

4.3;23;12;green grass

Code block:

transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)

Options:

The DataFrameReader is not accessed correctly.

The transaction is evaluated lazily, so no file will be read.

Spark is unable to understand the file type.

The code block is unable to capture all columns.

The resulting DataFrame will not have the appropriate schema.

Discussion 0

Questions 2

The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code

block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame

transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(itemsDf, __2__).__3__(__4__)

Options:

1. join

2. transactionsDf.productId==itemsDf.itemId, how="inner"

3. select

4. "transactionId", "supplier"

1. select

2. "transactionId", "supplier"

3. join

4. [transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId]

1. join

2. [transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId]

3. select

4. "transactionId", "supplier"

1. filter

2. "transactionId", "supplier"

3. join

4. "transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId"

1. join

2. transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId

3. filter

4. "transactionId", "supplier"

Discussion 0

Questions 3

Which of the following code blocks concatenates rows of DataFrames transactionsDf and transactionsNewDf, omitting any duplicates?

Options:

transactionsDf.concat(transactionsNewDf).unique()

transactionsDf.union(transactionsNewDf).distinct()

spark.union(transactionsDf, transactionsNewDf).distinct()

transactionsDf.join(transactionsNewDf, how="union").distinct()

transactionsDf.union(transactionsNewDf).unique()

Discussion 0

Questions 4

Which of the following code blocks reads in the JSON file stored at filePath, enforcing the schema expressed in JSON format in variable json_schema, shown in the code block below?

Code block:

1.json_schema = """

2.{"type": "struct",

3. "fields": [

4. {

5. "name": "itemId",

6. "type": "integer",

7. "nullable": true,

8. "metadata": {}

9. },

10. {

11. "name": "supplier",

12. "type": "string",

13. "nullable": true,

14. "metadata": {}

15. }

16. ]

17.}

18."""

Options:

spark.read.json(filePath, schema=json_schema)

spark.read.schema(json_schema).json(filePath)

1.schema = StructType.fromJson(json.loads(json_schema))

2.spark.read.json(filePath, schema=schema)

spark.read.json(filePath, schema=schema_of_json(json_schema))

spark.read.json(filePath, schema=spark.read.json(json_schema))

Discussion 0

Questions 5

The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf. The column should contain arrays of maximum 4 strings. The arrays should be composed of

the values in column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+

2.|itemId|itemName |supplier |

3.+------+----------------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |YetiX |

6.|3 |Outdoors Backpack |Sports Company Inc.|

7.+------+----------------------------------+-------------------+

Code block:

itemsDf.__1__(__2__, __3__(__4__, "[\s\-]", __5__))

Options:

1. withColumn

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 4

(Correct)

1. withColumnRenamed

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 4

1. withColumnRenamed

2. "itemName"

3. split

4. "itemNameBetweenSeparators"

5. 4

1. withColumn

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 5

1. withColumn

2. itemNameBetweenSeparators

3. str_split

4. "itemName"

5. 5

Discussion 0

Questions 6

The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the

code block to accomplish this.

transactionsDf.__1__(__2__)

Options:

1. filter

2. "transactionId", "predError", "value", "f"

1. select

2. "transactionId, predError, value, f"

1. select

2. ["transactionId", "predError", "value", "f"]

1. where

2. col("transactionId"), col("predError"), col("value"), col("f")

1. select

2. col(["transactionId", "predError", "value", "f"])

Discussion 0

Questions 7

Which of the following describes a difference between Spark's cluster and client execution modes?

Options:

In cluster mode, the cluster manager resides on a worker node, while it resides on an edge node in client mode.

In cluster mode, executor processes run on worker nodes, while they run on gateway nodes in client mode.

In cluster mode, the driver resides on a worker node, while it resides on an edge node in client mode.

In cluster mode, a gateway machine hosts the driver, while it is co-located with the executor in client mode.

In cluster mode, the Spark driver is not co-located with the cluster manager, while it is co-located in client mode.

Discussion 0

Questions 8

The code block displayed below contains an error. The code block should return DataFrame transactionsDf, but with the column storeId renamed to storeNumber. Find the error.

Code block:

transactionsDf.withColumn("storeNumber", "storeId")

Options:

Instead of withColumn, the withColumnRenamed method should be used.

Arguments "storeNumber" and "storeId" each need to be wrapped in a col() operator.

Argument "storeId" should be the first and argument "storeNumber" should be the second argument to the withColumn method.

The withColumn operator should be replaced with the copyDataFrame operator.

Instead of withColumn, the withColumnRenamed method should be used and argument "storeId" should be the first and argument "storeNumber" should be the second argument to that method.

Discussion 0

Questions 9

The code block displayed below contains an error. The code block should save DataFrame transactionsDf at path path as a parquet file, appending to any existing parquet file. Find the error.

Code block:

Options:

transactionsDf.format("parquet").option("mode", "append").save(path)

The code block is missing a reference to the DataFrameWriter.

save() is evaluated lazily and needs to be followed by an action.

The mode option should be omitted so that the command uses the default mode.

The code block is missing a bucketBy command that takes care of partitions.

Given that the DataFrame should be saved as parquet file, path is being passed to the wrong method.

Discussion 0

Questions 10

Which of the following is the idea behind dynamic partition pruning in Spark?

Options:

Dynamic partition pruning is intended to skip over the data you do not need in the results of a query.

Dynamic partition pruning concatenates columns of similar data types to optimize join performance.

Dynamic partition pruning performs wide transformations on disk instead of in memory.

Dynamic partition pruning reoptimizes physical plans based on data types and broadcast variables.

Dynamic partition pruning reoptimizes query plans based on runtime statistics collected during query execution.

Discussion 0

Questions 11

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

The column names should be listed directly as arguments to the operator and not as a list.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

The select operator should be replaced by a drop operator.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Discussion 0

Questions 12

The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.

A sample of DataFrame itemsDf is below.

Code block:

itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

Options:

Since itemId is the index, it does not need to be an argument to the select() method.

The alias() method needs to be called after the select() method.

The explode() method expects a Column object rather than a string.

explode() is not a method of DataFrame. explode() should be used inside the select() method instead.

The split() method should be used inside the select() method instead of the explode() method.

Discussion 0

Questions 13

Which of the following code blocks returns a single-column DataFrame showing the number of words in column supplier of DataFrame itemsDf?

Sample of DataFrame itemsDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

itemsDf.split("supplier", " ").count()

itemsDf.split("supplier", " ").size()

itemsDf.select(word_count("supplier"))

spark.select(size(split(col(supplier), " ")))

itemsDf.select(size(split("supplier", " ")))

Discussion 0

Questions 14

The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.

Find the error.

Code block:

1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")

Instead of calling spark.createDataFrame, just DataFrame should be called.

Options:

The commas in the tuples with the colors should be eliminated.

The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.

Instead of color, a data type should be specified.

The "color" expression needs to be wrapped in brackets, so it reads ["color"].

Discussion 0

Questions 15

The code block displayed below contains multiple errors. The code block should remove column transactionDate from DataFrame transactionsDf and add a column transactionTimestamp in which

dates that are expressed as strings in column transactionDate of DataFrame transactionsDf are converted into unix timestamps. Find the errors.

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+----------------+

3.+-------------+---------+-----+-------+---------+----+----------------+

4.| 1| 3| 4| 25| 1|null|2020-04-26 15:35|

5.| 2| 6| 7| 2| 2|null|2020-04-13 22:01|

6.| 3| 3| null| 25| 3|null|2020-04-02 10:53|

7.+-------------+---------+-----+-------+---------+----+----------------+

Code block:

1.transactionsDf = transactionsDf.drop("transactionDate")

2.transactionsDf["transactionTimestamp"] = unix_timestamp("transactionDate", "yyyy-MM-dd")

Options:

Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used

instead of the existing column assignment. Operator to_unixtime() should be used instead of unix_timestamp().

Column transactionDate should be dropped after transactionTimestamp has been written. The withColumn operator should be used instead of the existing column assignment. Column

transactionDate should be wrapped in a col() operator.

Column transactionDate should be wrapped in a col() operator.

The string indicating the date format should be adjusted. The withColumnReplaced operator should be used instead of the drop and assign pattern in the code block to replace column

transactionDate with the new column transactionTimestamp.

Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used

instead of the existing column assignment.

Discussion 0

Questions 16

The code block displayed below contains at least one error. The code block should return a DataFrame with only one column, result. That column should include all values in column value from

DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s).

Code block:

1.from pyspark.sql.functions import udf

2.from pyspark.sql import types as T

4.transactionsDf.createOrReplaceTempView('transactions')

6.def pow_5(x):

7. return x**5

9.spark.udf.register(pow_5, 'power_5_udf', T.LongType())

10.spark.sql('SELECT power_5_udf(value) FROM transactions')

Options:

The pow_5 method is unable to handle empty values in column value and the name of the column in the returned DataFrame is not result.

The returned DataFrame includes multiple columns instead of just one column.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and the SparkSession cannot access the transactionsDf

DataFrame.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and Spark driver does not call the UDF function

appropriately.

The pow_5 method is unable to handle empty values in column value, the UDF function is not registered properly with the Spark driver, and the name of the column in the returned DataFrame is

not result.

Discussion 0

Questions 17

The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in

the code block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__(__4__)

Options:

1. filter

2. "storeId"==25

3. collect

4. 5

1. filter

2. col("storeId")==25

3. toLocalIterator

4. 5

1. select

2. storeId==25

3. head

4. 5

1. filter

2. col("storeId")==25

3. take

4. 5

1. filter

2. col("storeId")==25

3. collect

4. 5

Discussion 0

Questions 18

Which of the following describes a narrow transformation?

Options:

narrow transformation is an operation in which data is exchanged across partitions.

A narrow transformation is a process in which data from multiple RDDs is used.

A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

A narrow transformation is an operation in which data is exchanged across the cluster.

A narrow transformation is an operation in which no data is exchanged across the cluster.

Discussion 0

Questions 19

The code block displayed below contains an error. The code block should configure Spark so that DataFrames up to a size of 20 MB will be broadcast to all worker nodes when performing a join.

Find the error.

Code block:

Options:

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 20)

Spark will only broadcast DataFrames that are much smaller than the default value.

The correct option to write configurations is through spark.config and not spark.conf.

Spark will only apply the limit to threshold joins and not to other joins.

The passed limit has the wrong variable type.

The command is evaluated lazily and needs to be followed by an action.

Discussion 0

Questions 20

Which of the following code blocks returns a copy of DataFrame transactionsDf that only includes columns transactionId, storeId, productId and f?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

transactionsDf.drop(col("value"), col("predError"))

transactionsDf.drop("predError", "value")

transactionsDf.drop(value, predError)

transactionsDf.drop(["predError", "value"])

transactionsDf.drop([col("predError"), col("value")])

Discussion 0

Questions 21

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

Options:

1. select

2. "storeId"

3. print_schema()

1. limit

2. 1

3. columns

1. select

2. "storeId"

3. printSchema()

1. limit

2. "storeId"

3. printSchema()

1. select

2. storeId

3. dtypes

Discussion 0

Questions 22

Which of the following describes slots?

Options:

Slots are dynamically created and destroyed in accordance with an executor's workload.

To optimize I/O performance, Spark stores data on disk in multiple slots.

A Java Virtual Machine (JVM) working as an executor can be considered as a pool of slots for task execution.

A slot is always limited to a single core.

Slots are the communication interface for executors and are used for receiving commands and sending results to the driver.

Discussion 0

Questions 23

Which of the following statements about RDDs is incorrect?

Options:

An RDD consists of a single partition.

The high-level DataFrame API is built on top of the low-level RDD API.

RDDs are immutable.

RDD stands for Resilient Distributed Dataset.

RDDs are great for precisely instructing Spark on how to do a query.

Discussion 0

Questions 24

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

Options:

itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

1.itemsDf.withColumnRenamed("attributes", "feature0")

2.itemsDf.withColumnRenamed("supplier", "feature1")

itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Discussion 0

Questions 25

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

Options:

transactionsDf.select("storeId").dropDuplicates().count()

transactionsDf.select(count("storeId")).dropDuplicates()

transactionsDf.select(distinct("storeId")).count()

transactionsDf.dropDuplicates().agg(count("storeId"))

transactionsDf.distinct().select("storeId").count()

Discussion 0

Questions 26

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

Options:

The arguments to the withColumn method need to be reordered.

The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.

The copy() operator should be appended to the code block to ensure a copy is returned.

Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.

The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Discussion 0

Questions 27

Which of the following statements about broadcast variables is correct?

Options:

Broadcast variables are serialized with every single task.

Broadcast variables are commonly used for tables that do not fit into memory.

Broadcast variables are immutable.

Broadcast variables are occasionally dynamically updated on a per-task basis.

Broadcast variables are local to the worker node and not shared across the cluster.

Discussion 0

Labour Day Special 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

examsbrite logo

Navigation:

Databricks Certified Associate Developer for Apache Spark 3.0 Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Questions with Experts Answers Updated Recently

Databricks Certified Associate Developer for Apache Spark 3.0 Exam Question and Answers

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Quick Links

Recently New Released Certification Exams

Site Secure