Labour Day Special 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

Databricks Certified Professional Data Scientist Exam Question and Answers

Databricks Certified Professional Data Scientist Exam

Last Update May 2, 2024
Total Questions : 138

We are offering FREE Databricks-Certified-Professional-Data-Scientist Databricks exam questions. All you do is to just go and sign up. Give your details, prepare Databricks-Certified-Professional-Data-Scientist free exam questions and then go for complete pool of Databricks Certified Professional Data Scientist Exam test questions that will help you more.

Databricks-Certified-Professional-Data-Scientist pdf

Databricks-Certified-Professional-Data-Scientist PDF

$35  $99.99
Databricks-Certified-Professional-Data-Scientist Engine

Databricks-Certified-Professional-Data-Scientist Testing Engine

$42  $119.99
Databricks-Certified-Professional-Data-Scientist PDF + Engine

Databricks-Certified-Professional-Data-Scientist PDF + Testing Engine

$56  $159.99
Questions 1

You are working on a email spam filtering assignment, while working on this you find there is new word e.g. HadoopExam comes in email, and in your solutions you never come across this word before, hence probability of this words is coming in either email could be zero. So which of the following algorithm can help you to avoid zero probability?

Options:

A.  

Naive Bayes

B.  

Laplace Smoothing

C.  

Logistic Regression

D.  

All of the above

Discussion 0
Questions 2

RMSE is a good measure of accuracy, but only to compare forecasting errors of different models for a______, as it is scale-dependent.

Options:

A.  

Between Variables

B.  

Particular Variable

C.  

Among all the variables

D.  

All of the above are correct

Discussion 0
Questions 3

Scenario: Suppose that Bob can decide to go to work by one of three modes of transportation,

car, bus, or commuter train. Because of high traffic, if he decides to go by car. there is a 50% chance he will be late. If he goes by bus, which has special reserved lanes but is sometimes overcrowded, the probability of being late is only 20%. The commuter train is almost never late, with a probability of only 1 %, but is more expensive than the bus.

Suppose that Bob is late one day, and his boss wishes to estimate the probability that he drove to work that day by car. Since he does not know Which mode of transportation Bob usually uses, he gives a prior probability of 1 3 to each of the three possibilities. Which of the following method the boss will use to estimate of the probability that Bob drove to work?

Options:

A.  

Naive Bayes

B.  

Linear regression

C.  

Random decision forests

D.  

None of the above

Discussion 0
Questions 4

Which of the following true with regards to the K-Means clustering algorithm?

Options:

A.  

Labels are not pre-assigned to each objects in the cluster.

B.  

Labels are pre-assigned to each objects in the cluster.

C.  

It classify the data based on the labels.

D.  

It discovers the center of each cluster.

E.  

It find each objects fall in which particular cluster

Discussion 0
Questions 5

Refer to image below

Options:

A.  

Option A

B.  

Option B

C.  

Option C

D.  

Option D

Discussion 0
Questions 6

Suppose there are three events then which formula must always be equal to P(E1|E2,E3)?

Options:

A.  

P(E1,E2,E3)P(E1)/P(E2:E3)

B.  

P(E1,E2;E3)/P(E2,E3)

C.  

P(E1,E2|E3)P(E2|E3)P(E3)

D.  

P(E1,E2|E3)P(E3)

E.  

P(E1,E2,E3)P(E2)P(E3)

Discussion 0
Questions 7

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.

Above is an example of

Options:

A.  

Linear Regression

B.  

Logistic Regression

C.  

Recommendation system

D.  

Maximum likelihood estimation

E.  

Hierarchical linear models

Discussion 0
Questions 8

What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

Options:

A.  

Expected value

B.  

Variance

C.  

Linear regression

D.  

Quantiles

Discussion 0
Questions 9

Select the correct algorithm of unsupervised algorithm

Options:

A.  

K-Nearest Neighbors

B.  

K-Means

C.  

Support Vector Machines

D.  

Naive Bayes

Discussion 0
Questions 10

Question-26. There are 5000 different color balls, out of which 1200 are pink color. What is the maximum likelihood estimate for the proportion of "pink" items in the test set of color balls?

Options:

A.  

2.4

B.  

24 0

C.  

.24

D.  

.48

E.  

4.8

Discussion 0
Questions 11

In which of the following scenario you should apply the Bay's Theorem

Options:

A.  

The sample space is partitioned into a set of mutually exclusive events {A1, A2, . .., An }.

B.  

Within the sample space, there exists an event B, for which P(B) > 0.

C.  

The analytical goal is to compute a conditional probability of the form: P(Ak | B ).

D.  

In all above cases

Discussion 0
Questions 12

Your customer provided you with 2. 000 unlabeled records three groups. What is the correct analytical method to use?

Options:

A.  

Semi Linear Regression

B.  

Logistic regression

C.  

Naive Bayesian classification

D.  

Linear regression

E.  

K-means clustering

Discussion 0
Questions 13

Select the statement which applies correctly to the Naive Bayes

Options:

A.  

Works with a small amount of data

B.  

Sensitive to how the input data is prepared

C.  

Works with nominal values

Discussion 0
Questions 14

Find out the classifier which assumes independence among all its features?

Options:

A.  

Neural networks

B.  

Linear Regression

C.  

Naive Bayes

D.  

Random forests

Discussion 0
Questions 15

Which of the following is a correct example of the target variable in regression (supervised learning)?

Options:

A.  

Nominal values like true, false

B.  

Reptile, fish, mammal, amphibian, plant, fungi

C.  

Infinite number of numeric values, such as 0.100, 42.001, 1000.743..

D.  

All of the above

Discussion 0
Questions 16

You are working on a Data Science project and during the project you have been gibe a responsibility to interview all the stakeholders in the project. In which phase of the project you are?

Options:

A.  

Discovery

B.  

Data Preparations

C.  

Creating Models

D.  

Executing Models

E.  

Creating visuals from the outcome

F.  

Operationnalise the models

Discussion 0
Questions 17

Refer to the exhibit.

You are building a decision tree. In this exhibit, four variables are listed with their respective values of info-gain.

Based on this information, on which attribute would you expect the next split to be in the decision tree?

Options:

A.  

Credit Score

B.  

Age

C.  

Income

D.  

Gender

Discussion 0
Questions 18

What is the best way to evaluate the quality of the model found by an unsupervised algorithm like k-means clustering, given metrics for the cost of the clustering (how well it fits the data) and its stability (how similar the clusters are across multiple runs over the same data)?

Options:

A.  

The lowest cost clustering subject to a stability constraint

B.  

The lowest cost clustering

C.  

The most stable clustering subject to a minimal cost constraint

D.  

The most stable clustering

Discussion 0
Questions 19

Clustering is a type of unsupervised learning with the following goals

Options:

A.  

Maximize a utility function

B.  

Find similarities in the training data

C.  

Not to maximize a utility function

D.  

1 and 2

E.  

2 and 3

Discussion 0
Questions 20

What describes a true limitation of Logistic Regression method?

Options:

A.  

It does not handle redundant variables well.

B.  

It does not handle missing values well.

C.  

It does not handle correlated variables well.

D.  

It does not have explanatory values.

Discussion 0