Google Professional Machine Learning Engineer Question and Answers

Google Professional Machine Learning Engineer

Last Update May 5, 2024
Total Questions : 268

We are offering FREE Professional-Machine-Learning-Engineer Google exam questions. All you do is to just go and sign up. Give your details, prepare Professional-Machine-Learning-Engineer free exam questions and then go for complete pool of Google Professional Machine Learning Engineer test questions that will help you more.

Professional-Machine-Learning-Engineer PDF

$35 ~~$99.99~~

Add to Cart

Professional-Machine-Learning-Engineer Engine

Professional-Machine-Learning-Engineer Testing Engine

$42 ~~$119.99~~

Add to Cart

Professional-Machine-Learning-Engineer PDF + Engine

Professional-Machine-Learning-Engineer PDF + Testing Engine

$56 ~~$159.99~~

Add to Cart

Questions 1

Your organization's call center has asked you to develop a model that analyzes customer sentiments in each call. The call center receives over one million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region in which the call originated, and no Personally Identifiable Information (Pll) can be stored or analyzed. The data science team has a third-party tool for visualization and access which requires a SQL ANSI-2011 compliant interface. You need to select components for data processing and for analytics. How should the data pipeline be designed?

Options:

1 = Dataflow, 2 = BigQuery

1 = Pub/Sub, 2 = Datastore

1 = Dataflow, 2 = Cloud SQL

1 = Cloud Function, 2 = Cloud SQL

Discussion 0

Questions 2

You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (Pll) to Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the Pll is not accessible by unauthorized individuals?

Options:

Stream all files to Google CloudT and then write the data to BigQuery Periodically conduct a bulk scan of the table using the DLP API.

Stream all files to Google Cloud, and write batches of the data to BigQuery While the data is being written to BigQuery conduct a bulk scan of the data using the DLP API.

Create two buckets of data Sensitive and Non-sensitive Write all data to the Non-sensitive bucket Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket

Create three buckets of data: Quarantine, Sensitive, and Non-sensitive Write all data to the Quarantine bucket.

Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket

Discussion 0

Questions 3

You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same dashboard. What should you do?

Options:

Create multiple models using AutoML Tables

Automate multiple training runs using Cloud Composer

Run multiple training jobs on Al Platform with similar job names

Create an experiment in Kubeflow Pipelines to organize multiple runs

Discussion 0

Questions 4

You developed a custom model by using Vertex Al to forecast the sales of your company s products based on historical transactional data You anticipate changes in the feature distributions and the correlations between the features in the near future You also expect to receive a large volume of prediction requests You plan to use Vertex Al Model Monitoring for drift detection and you want to minimize the cost. What should you do?

Options:

Use the features for monitoring Set a monitoring- frequency value that is higher than the default.

Use the features for monitoring Set a prediction-sampling-rare value that is closer to 1 than 0.

Use the features and the feature attributions for monitoring. Set a monitoring-frequency value that is lower than the default.

Use the features and the feature attributions for monitoring Set a prediction-sampling-rate value that is closer to 0 than 1.

Discussion 0

Questions 5

Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a change in the distribution of the input data. How should you address the input differences in production?

Options:

Create alerts to monitor for skew, and retrain the model.

Perform feature selection on the model, and retrain the model with fewer features

Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service

Perform feature selection on the model, and retrain the model on a monthly basis with fewer features

Discussion 0

Questions 6

You are working with a dataset that contains customer transactions. You need to build an ML model to predict customer purchase behavior You plan to develop the model in BigQuery ML, and export it to Cloud Storage for online prediction You notice that the input data contains a few categorical features, including product category and payment method You want to deploy the model as quickly as possible. What should you do?

Options:

Use the transform clause with the ML. ONE_HOT_ENCODER function on the categorical features at model creation and select the categorical and non-categorical features.

Use the ML. ONE_HOT_ENCODER function on the categorical features, and select the encoded categorical features and non-categorical features as inputs to create your model.

Use the create model statement and select the categorical and non-categorical features.

Use the ML. ONE_HOT_ENCODER function on the categorical features, and select the encoded categorical features and non-categorical features as inputs to create your model.

Discussion 0

Questions 7

You are building a linear model with over 100 input features, all with values between -1 and 1. You suspect that many features are non-informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which technique should you use?

Options:

Use Principal Component Analysis to eliminate the least informative features.

Use L1 regularization to reduce the coefficients of uninformative features to 0.

After building your model, use Shapley values to determine which features are the most informative.

Use an iterative dropout technique to identify which features do not degrade the model when removed.

Discussion 0

Questions 8

You work for a hospital that wants to optimize how it schedules operations. You need to create a model that uses the relationship between the number of surgeries scheduled and beds used You want to predict how many beds will be needed for patients each day in advance based on the scheduled surgeries You have one year of data for the hospital organized in 365 rows

The data includes the following variables for each day

• Number of scheduled surgeries

• Number of beds occupied

• Date

You want to maximize the speed of model development and testing What should you do?

Options:

Create a BigQuery table Use BigQuery ML to build a regression model, with number of beds as the target variable and number of scheduled surgeries and date features (such as day of week) as the predictors

Create a BigQuery table Use BigQuery ML to build an ARIMA model, with number of beds as the target variable and date as the time variable.

Create a Vertex Al tabular dataset Tram an AutoML regression model, with number of beds as the target variable and number of scheduled minor surgeries and date features (such as day of the week) as the predictors

Create a Vertex Al tabular dataset Train a Vertex Al AutoML Forecasting model with number of beds as the target variable, number of scheduled surgeries as a covariate, and date as the time variable.

Discussion 0

Questions 9

You work on a team that builds state-of-the-art deep learning models by using the TensorFlow framework. Your team runs multiple ML experiments each week which makes it difficult to track the experiment runs. You want a simple approach to effectively track, visualize and debug ML experiment runs on Google Cloud while minimizing any overhead code. How should you proceed?

Options:

Set up Vertex Al Experiments to track metrics and parameters Configure Vertex Al TensorBoard for visualization.

Set up a Cloud Function to write and save metrics files to a Cloud Storage Bucket Configure a Google Cloud VM to host TensorBoard locally for visualization.

Set up a Vertex Al Workbench notebook instance Use the instance to save metrics data in a Cloud Storage bucket and to host TensorBoard locally for visualization.

Set up a Cloud Function to write and save metrics files to a BigQuery table. Configure a Google Cloud VM to host TensorBoard locally for visualization.

Discussion 0

Questions 10

You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?

Options:

Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.

Separate each data scientist’s work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.

Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources.

Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In BigQuery, create a SQL view that maps users to the resources they are using

Discussion 0

Questions 11

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, scikit-learn, and custom libraries. What should you do?

Options:

Use the Vertex AI Training to submit training jobs using any framework.

Configure Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob.

Create a library of VM images on Compute Engine, and publish these images on a centralized repository.

Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Discussion 0

Questions 12

You recently trained an XGBoost model on tabular data You plan to expose the model for internal use as an HTTP microservice After deployment you expect a small number of incoming requests. You want to productionize the model with the least amount of effort and latency. What should you do?

Options:

Deploy the model to BigQuery ML by using CREATE model with the BOOSTED-THREE-REGRESSOR statement and invoke the BigQuery API from the microservice.

Build a Flask-based app Package the app in a custom container on Vertex Al and deploy it to Vertex Al Endpoints.

Build a Flask-based app Package the app in a Docker image and deploy it to Google Kubernetes Engine in Autopilot mode.

Use a prebuilt XGBoost Vertex container to create a model and deploy it to Vertex Al Endpoints.

Discussion 0

Questions 13

You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production model is required to keep up with market changes. Since being deployed to production, the model hasn't changed; however the accuracy of the model has steadily deteriorated. What issue is most likely causing the steady decline in model accuracy?

Options:

Poor data quality

Lack of model retraining

Too few layers in the model for capturing information

Incorrect data split ratio during model training, evaluation, validation, and test

Discussion 0

Questions 14

You need to deploy a scikit-learn classification model to production. The model must be able to serve requests 24/7 and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment What should you do?

Options:

Deploy an online Vertex Al prediction endpoint Set the max replica count to 1

Deploy an online Vertex Al prediction endpoint Set the max replica count to 100

Deploy an online Vertex Al prediction endpoint with one GPU per replica Set the max replica count to 1.

Deploy an online Vertex Al prediction endpoint with one GPU per replica Set the max replica count to 100.

Discussion 0

Questions 15

You want to migrate a scikrt-learn classifier model to TensorFlow. You plan to train the TensorFlow classifier model using the same training set that was used to train the scikit-learn model and then compare the performances using a common test set. You want to use the Vertex Al Python SDK to manually log the evaluation metrics of each model and compare them based on their F1 scores and confusion matrices. How should you log the metrics?

Options:

Discussion 0

Questions 16

You recently deployed a model lo a Vertex Al endpoint and set up online serving in Vertex Al Feature Store. You have configured a daily batch ingestion job to update your featurestore During the batch ingestion jobs you discover that CPU utilization is high in your featurestores online serving nodes and that feature retrieval latency is high. You need to improve online serving performance during the daily batch ingestion. What should you do?

Options:

Schedule an increase in the number of online serving nodes in your featurestore prior to the batch ingestion jobs.

Enable autoscaling of the online serving nodes in your featurestore

Enable autoscaling for the prediction nodes of your DeployedModel in the Vertex Al endpoint.

Increase the worker counts in the importFeaturevalues request of your batch ingestion job.

Discussion 0

Questions 17

You are building a TensorFlow model for a financial institution that predicts the impact of consumer spending on inflation globally. Due to the size and nature of the data, your model is long-running across all types of hardware, and you have built frequent checkpointing into the training process. Your organization has asked you to minimize cost. What hardware should you choose?

Options:

A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with 4 NVIDIA P100 GPUs

A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with an NVIDIA P100 GPU

A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a non-preemptible v3-8 TPU

A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a preemptible v3-8 TPU

Discussion 0

Questions 18

You developed a custom model by using Vertex Al to predict your application's user churn rate You are using Vertex Al Model Monitoring for skew detection The training data stored in BigQuery contains two sets of features - demographic and behavioral You later discover that two separate models trained on each set perform better than the original model

You need to configure a new model mentioning pipeline that splits traffic among the two models You want to use the same prediction-sampling-rate and monitoring-frequency for each model You also want to minimize management effort What should you do?

Options:

Keep the training dataset as is Deploy the models to two separate endpoints and submit two Vertex Al Model Monitoring jobs with appropriately selected feature-thresholds parameters

Keep the training dataset as is Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and feature selections

Separate the training dataset into two tables based on demographic and behavioral features Deploy the models to two separate endpoints, and submit two Vertex Al Model Monitoring jobs

Separate the training dataset into two tables based on demographic and behavioral features. Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and training datasets

Discussion 0

Questions 19

Your organization manages an online message board A few months ago, you discovered an increase in toxic language and bullying on the message board. You deployed an automated text classifier that flags certain comments as toxic or harmful. Now some users are reporting that benign comments referencing their religion are being misclassified as abusive Upon further inspection, you find that your classifier's false positive rate is higher for comments that reference certain underrepresented religious groups. Your team has a limited budget and is already overextended. What should you do?

Options:

Add synthetic training data where those phrases are used in non-toxic ways

Remove the model and replace it with human moderation.

Replace your model with a different text classifier.

Raise the threshold for comments to be considered toxic or harmful

Discussion 0

Questions 20

You are training an ML model on a large dataset. You are using a TPU to accelerate the training process You notice that the training process is taking longer than expected. You discover that the TPU is not reaching its full capacity. What should you do?

Options:

Increase the learning rate

Increase the number of epochs

Decrease the learning rate

Increase the batch size

Discussion 0

Questions 21

You are developing an ML model that predicts the cost of used automobiles based on data such as location, condition model type color, and engine-'battery efficiency. The data is updated every night Car dealerships will use the model to determine appropriate car prices. You created a Vertex Al pipeline that reads the data splits the data into training/evaluation/test sets performs feature engineering trains the model by using the training dataset and validates the model by using the evaluation dataset. You need to configure a retraining workflow that minimizes cost What should you do?

Options:

Compare the training and evaluation losses of the current run If the losses are similar, deploy the model to a Vertex AI endpoint Configure a cron job to redeploy the pipeline every night.

Compare the training and evaluation losses of the current run If the losses are similar deploy the model to a Vertex Al endpoint with training/serving skew threshold model monitoring When the model monitoring threshold is tnggered redeploy the pipeline.

Compare the results to the evaluation results from a previous run If the performance improved deploy the model to a Vertex Al endpoint Configure a cron job to redeploy the pipeline every night.

Compare the results to the evaluation results from a previous run If the performance improved deploy the model to a Vertex Al endpoint with training/serving skew threshold model monitoring. When the model monitoring threshold is triggered, redeploy the pipeline.

Discussion 0

Questions 22

Your company manages a video sharing website where users can watch and upload videos. You need to

create an ML model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your company’s website. Which result should you use to determine whether the model is successful?

Options:

The model predicts videos as popular if the user who uploads them has over 10,000 likes.

The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.

The model predicts 95% of the most popular videos measured by watch time within 30 days of being

uploaded.

The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.

Discussion 0

Questions 23

You are developing a model to help your company create more targeted online advertising campaigns. You need to create a dataset that you will use to train the model. You want to avoid creating or reinforcing unfair bias in the model. What should you do?

Choose 2 answers

Options:

Include a comprehensive set of demographic features.

include only the demographic groups that most frequently interact with advertisements.

Collect a random sample of production traffic to build the training dataset.

Collect a stratified sample of production traffic to build the training dataset.

Conduct fairness tests across sensitive categories and demographics on the trained model.

Discussion 0

Questions 24

You work for the AI team of an automobile company, and you are developing a visual defect detection model using TensorFlow and Keras. To improve your model performance, you want to incorporate some image augmentation functions such as translation, cropping, and contrast tweaking. You randomly apply these functions to each training batch. You want to optimize your data processing pipeline for run time and compute resources utilization. What should you do?

Options:

Embed the augmentation functions dynamically in the tf.Data pipeline.

Embed the augmentation functions dynamically as part of Keras generators.

Use Dataflow to create all possible augmentations, and store them as TFRecords.

Use Dataflow to create the augmentations dynamically per training run, and stage them as TFRecords.

Discussion 0

Questions 25

You are developing a mode! to detect fraudulent credit card transactions. You need to prioritize detection because missing even one fraudulent transaction could severely impact the credit card holder. You used AutoML to tram a model on users' profile information and credit card transaction data. After training the initial model, you notice that the model is failing to detect many fraudulent transactions. How should you adjust the training parameters in AutoML to improve model performance?

Choose 2 answers

Options:

Increase the score threshold.

Decrease the score threshold.

Add more positive examples to the training set.

Add more negative examples to the training set.

Reduce the maximum number of node hours for training.

Discussion 0

Answer:

B, C

Explanation:

The best options for adjusting the training parameters in AutoML to improve model performance are to decrease the score threshold and add more positive examples to the training set. These options can help increase the detection rate of fraudulent transactions, which is the priority for this use case. The score threshold is a parameter that determines the minimum probability score that a prediction must have to be classified as positive. Decreasing the score threshold can increase the recall of the model, which is the proportion of actual positive cases that are correctly identified. Increasing the recall can help reduce the number of false negatives, which are fraudulent transactions that are missed by the model. However, decreasing the score threshold can also decrease the precision of the model, which is the proportion of positive predictions that are actually correct. Decreasing the precision can increase the number of false positives, which are legitimate transactions that are flagged as fraudulent by the model. Therefore, there is a trade-off between recall and precision, and the optimal score threshold depends on the business objective and the cost of errors1. Adding more positive examples to the training set can help balance the data distribution and improve the model performance. Positive examples are the instances that belong to the target class, which in this case are fraudulent transactions. Negative examples are the instances that belong to the other class, which in this case are legitimate transactions. Fraudulent transactions are usually rare and imbalanced compared to legitimate transactions, which can cause the model to be biased towards the majority class and fail to learn the characteristics of the minority class. Adding more positive examples can help the model learn more features and patterns of the fraudulent transactions, and increase the detection rate2.

The other options are not as good as options B and C, for the following reasons:

Option A: Increasing the score threshold would decrease the detection rate of fraudulent transactions, which is the opposite of the desired outcome. Increasing the score threshold would decrease the recall of the model, which is the proportion of actual positive cases that are correctly identified. Decreasing the recall would increase the number of false negatives, which are fraudulent transactions that are missed by the model. Increasing the score threshold would increase the precision of the model, which is the proportion of positive predictions that are actually correct. Increasing the precision would decrease the number of false positives, which are legitimate transactions that are flagged as fraudulent by the model. However, in this use case, the cost of false negatives is much higher than the cost of false positives, so increasing the score threshold is not a good option1.
Option D: Adding more negative examples to the training set would not improve the model performance, and could worsen the data imbalance. Negative examples are the instances that belong to the other class, which in this case are legitimate transactions. Legitimate transactions are usually abundant and dominant compared to fraudulent transactions, which can cause the model to be biased towards the majority class and fail to learn the characteristics of the minority class. Adding more negative examples would exacerbate this problem, and decrease the detection rate of the fraudulent transactions2.
Option E: Reducing the maximum number of node hours for training would not improve the model performance, and could limit the model optimization. Node hours are the units of computation that are used to train an AutoML model. The maximum number of node hours is a parameter that determines the upper limit of node hours that can be used for training. Reducing the maximum number of node hours would reduce the training time and cost, but also the model quality and accuracy. Reducing the maximum number of node hours would limit the number of iterations, trials, and evaluations that the model can perform, and prevent the model from finding the optimal hyperparameters and architecture3.

References:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 5: Responsible AI, Week 4: Evaluation
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 2: Developing high-quality ML models, 2.2 Handling imbalanced data
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 4: Low-code ML Solutions, Section 4.3: AutoML
Understanding the score threshold slider
Handling imbalanced data sets in machine learning
AutoML Vision pricing

Questions 26

You are an ML engineer at a manufacturing company You are creating a classification model for a predictive maintenance use case You need to predict whether a crucial machine will fail in the next three days so that the repair crew has enough time to fix the machine before it breaks. Regular maintenance of the machine is relatively inexpensive, but a failure would be very costly You have trained several binary classifiers to predict whether the machine will fail. where a prediction of 1 means that the ML model predicts a failure.

You are now evaluating each model on an evaluation dataset. You want to choose a model that prioritizes detection while ensuring that more than 50% of the maintenance jobs triggered by your model address an imminent machine failure. Which model should you choose?

Options:

The model with the highest area under the receiver operating characteristic curve (AUC ROC) and precision greater than 0 5

The model with the lowest root mean squared error (RMSE) and recall greater than 0.5.

The model with the highest recall where precision is greater than 0.5.

The model with the highest precision where recall is greater than 0.5.

Discussion 0

Questions 27

You work for a magazine publisher and have been tasked with predicting whether customers will cancel their annual subscription. In your exploratory data analysis, you find that 90% of individuals renew their subscription every year, and only 10% of individuals cancel their subscription. After training a NN Classifier, your model predicts those who cancel their subscription with 99% accuracy and predicts those who renew their subscription with 82% accuracy. How should you interpret these results?

Options:

This is not a good result because the model should have a higher accuracy for those who renew their subscription than for those who cancel their subscription.

This is not a good result because the model is performing worse than predicting that people will always renew their subscription.

This is a good result because predicting those who cancel their subscription is more difficult, since there is less data for this group.

This is a good result because the accuracy across both groups is greater than 80%.

Discussion 0

Questions 28

You are going to train a DNN regression model with Keras APIs using this code:

How many trainable weights does your model have? (The arithmetic below is correct.)

Options:

501*256+257*128+2 = 161154

500*256+256*128+128*2 = 161024

501*256+257*128+128*2=161408

500*256*0 25+256*128*0 25+128*2 = 40448

Discussion 0

Questions 29

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

Options:

Configure AutoML Tables to perform the classification task

Run a BigQuery ML task to perform logistic regression for the classification

Use Al Platform Notebooks to run the classification model with pandas library

Use Al Platform to run the classification model job configured for hyperparameter tuning

Discussion 0

Questions 30

You have recently trained a scikit-learn model that you plan to deploy on Vertex Al. This model will support both online and batch prediction. You need to preprocess input data for model inference. You want to package the model for deployment while minimizing additional code What should you do?

Options:

1 Upload your model to the Vertex Al Model Registry by using a prebuilt scikit-learn prediction container

2 Deploy your model to Vertex Al Endpoints, and create a Vertex Al batch prediction job that uses the instanceConfig.inscanceType setting to transform your input data

1 Wrap your model in a custom prediction routine (CPR). and build a container image from the CPR local model

2 Upload your sci-kit learn model container to Vertex Al Model Registry

3 Deploy your model to Vertex Al Endpoints, and create a Vertex Al batch prediction job

1. Create a custom container for your sci-kit learn model,

2 Define a custom serving function for your model

3 Upload your model and custom container to Vertex Al Model Registry

4 Deploy your model to Vertex Al Endpoints, and create a Vertex Al batch prediction job

1 Create a custom container for your sci-kit learn model.

2 Upload your model and custom container to Vertex Al Model Registry

3 Deploy your model to Vertex Al Endpoints, and create a Vertex Al batch prediction job that uses the instanceConfig. instanceType setting to transform your input data

Discussion 0

Questions 31

You are training a Resnet model on Al Platform using TPUs to visually categorize types of defects in automobile engines. You capture the training profile using the Cloud TPU profiler plugin and observe that it is highly input-bound. You want to reduce the bottleneck and speed up your model training process. Which modifications should you make to the tf .data dataset?

Choose 2 answers

Options:

Use the interleave option for reading data

Reduce the value of the repeat parameter

Increase the buffer size for the shuffle option.

Set the prefetch option equal to the training batch size

Decrease the batch size argument in your transformation

Discussion 0

Answer:

A, D

Explanation:

The tf.data dataset is a TensorFlow API that provides a way to create and manipulate data pipelines for machine learning. The tf.data dataset allows you to apply various transformations to the data, such as reading, shuffling, batching, prefetching, and interleaving. These transformations can affect the performance and efficiency of the model training process1

One of the common performance issues in model training is input-bound, which means that the model is waiting for the input data to be ready and is not fully utilizing the computational resources. Input-bound can be caused by slow data loading, insufficient parallelism, or large data size. Input-bound can be detected by using the Cloud TPU profiler plugin, which is a tool that helps you analyze the performance of your model on Cloud TPUs. The Cloud TPU profiler plugin can show you the percentage of time that the TPU cores are idle, which indicates input-bound2

To reduce the input-bound bottleneck and speed up the model training process, you can make some modifications to the tf.data dataset. Two of the modifications that can help are:

Use the interleave option for reading data. The interleave option allows you to read data from multiple files in parallel and interleave their records. This can improve the data loading speed and reduce the idle time of the TPU cores. The interleave option can be applied by using the tf.data.Dataset.interleave method, which takes a function that returns a dataset for each input element, and a number of parallel calls3
Set the prefetch option equal to the training batch size. The prefetch option allows you to prefetch the next batch of data while the current batch is being processed by the model. This can reduce the latency between batches and improve the throughput of the model training. The prefetch option can be applied by using the tf.data.Dataset.prefetch method, which takes a buffer size argument. The buffer size should be equal to the training batch size, which is the number of examples per batch4

The other options are not effective or counterproductive. Reducing the value of the repeat parameter will reduce the number of epochs, which is the number of times the model sees the entire dataset. This can affect the model’s accuracy and convergence. Increasing the buffer size for the shuffle option will increase the randomness of the data, but also increase the memory usage and the data loading time. Decreasing the batch size argument in your transformation will reduce the number of examples per batch, which can affect the model’s stability and performance.

References: 1: tf.data: Build TensorFlow input pipelines 2: Cloud TPU Tools in TensorBoard 3: tf.data.Dataset.interleave 4: tf.data.Dataset.prefetch : [Better performance with the tf.data API]

Questions 32

You are training an LSTM-based model on Al Platform to summarize text using the following job submission script:

You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?

Options:

Modify the 'epochs' parameter

Modify the 'scale-tier' parameter

Modify the batch size' parameter

Modify the 'learning rate' parameter

Discussion 0

Questions 33

You are developing an ML model that uses sliced frames from video feed and creates bounding boxes around specific objects. You want to automate the following steps in your training pipeline: ingestion and preprocessing of data in Cloud Storage, followed by training and hyperparameter tuning of the object model using Vertex AI jobs, and finally deploying the model to an endpoint. You want to orchestrate the entire pipeline with minimal cluster management. What approach should you use?

Options:

Use Kubeflow Pipelines on Google Kubernetes Engine.

Use Vertex AI Pipelines with TensorFlow Extended (TFX) SDK.

Use Vertex AI Pipelines with Kubeflow Pipelines SDK.

Use Cloud Composer for the orchestration.

Discussion 0

Questions 34

You have trained a text classification model in TensorFlow using Al Platform. You want to use the trained model for batch predictions on text data stored in BigQuery while minimizing computational overhead. What should you do?

Options:

Export the model to BigQuery ML.

Deploy and version the model on Al Platform.

Use Dataflow with the SavedModel to read the data from BigQuery

Submit a batch prediction job on Al Platform that points to the model location in Cloud Storage.

Discussion 0

Questions 35

You work for a semiconductor manufacturing company. You need to create a real-time application that automates the quality control process High-definition images of each semiconductor are taken at the end of the assembly line in real time. The photos are uploaded to a Cloud Storage bucket along with tabular data that includes each semiconductor's batch number serial number dimensions, and weight You need to configure model training and serving while maximizing model accuracy. What should you do?

Options:

Use Vertex Al Data Labeling Service to label the images and train an AutoML image classification model.

Deploy the model and configure Pub/Sub to publish a message when an image is categorized into the failing class.

Use Vertex Al Data Labeling Service to label the images and train an AutoML image classification model. Schedule a daily batch prediction job that publishes a Pub/Sub message when the job completes.

Convert the images into an embedding representation Import this data into BigQuery, and train a BigQuery. ML K-means clustenng model with two clusters Deploy the model and configure Pub/Sub to publish a message when a semiconductor's data is categorized into the failing cluster.

Import the tabular data into BigQuery use Vertex Al Data Labeling Service to label the data and train an AutoML tabular classification model Deploy the model and configure Pub/Sub to publish a message when a semiconductor's data is categorized into the failing class.

Discussion 0

Questions 36

You have deployed multiple versions of an image classification model on Al Platform. You want to monitor the performance of the model versions overtime. How should you perform this comparison?

Options:

Compare the loss performance for each model on a held-out dataset.

Compare the loss performance for each model on the validation data

Compare the receiver operating characteristic (ROC) curve for each model using the What-lf Tool

Compare the mean average precision across the models using the Continuous Evaluation feature

Discussion 0

Questions 37

You need to train a natural language model to perform text classification on product descriptions that contain millions of examples and 100,000 unique words. You want to preprocess the words individually so that they can be fed into a recurrent neural network. What should you do?

Options:

Create a hot-encoding of words, and feed the encodings into your model.

Identify word embeddings from a pre-trained model, and use the embeddings in your model.

Sort the words by frequency of occurrence, and use the frequencies as the encodings in your model.

Assign a numerical value to each word from 1 to 100,000 and feed the values as inputs in your model.

Discussion 0

Questions 38

You work at a bank You have a custom tabular ML model that was provided by the bank's vendor. The training data is not available due to its sensitivity. The model is packaged as a Vertex Al Model serving container which accepts a string as input for each prediction instance. In each string the feature values are separated by commas. You want to deploy this model to production for online predictions, and monitor the feature distribution over time with minimal effort What should you do?

Options:

1 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Ai endpoint.

2. Create a Vertex Al Model Monitoring job with feature drift detection as the monitoring objective, and provide an instance schema.

1 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.

2 Create a Vertex Al Model Monitoring job with feature skew detection as the monitoring objective and provide an instance schema.

1 Refactor the serving container to accept key-value pairs as input format.

2. Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.

3. Create a Vertex Al Model Monitoring job with feature drift detection as the monitoring objective.

1 Refactor the serving container to accept key-value pairs as input format.

2 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.

3. Create a Vertex Al Model Monitoring job with feature skew detection as the monitoring objective.

Discussion 0

Questions 39

You have trained a deep neural network model on Google Cloud. The model has low loss on the training data, but is performing worse on the validation data. You want the model to be resilient to overfitting. Which strategy should you use when retraining the model?

Options:

Apply a dropout parameter of 0 2, and decrease the learning rate by a factor of 10

Apply a L2 regularization parameter of 0.4, and decrease the learning rate by a factor of 10.

Run a hyperparameter tuning job on Al Platform to optimize for the L2 regularization and dropout parameters

Run a hyperparameter tuning job on Al Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2.

Discussion 0

Questions 40

You have trained an XGBoost model that you plan to deploy on Vertex Al for online prediction. You are now uploading your model to Vertex Al Model Registry, and you need to configure the explanation method that will serve online prediction requests to be returned with minimal latency. You also want to be alerted when feature attributions of the model meaningfully change over time. What should you do?

Options:

1 Specify sampled Shapley as the explanation method with a path count of 5.

2 Deploy the model to Vertex Al Endpoints.

3. Create a Model Monitoring job that uses prediction drift as the monitoring objective.

1 Specify Integrated Gradients as the explanation method with a path count of 5.

2 Deploy the model to Vertex Al Endpoints.

3. Create a Model Monitoring job that uses prediction drift as the monitoring objective.

1. Specify sampled Shapley as the explanation method with a path count of 50.

2. Deploy the model to Vertex Al Endpoints.

3. Create a Model Monitoring job that uses training-serving skew as the monitoring objective.

1 Specify Integrated Gradients as the explanation method with a path count of 50.

2. Deploy the model to Vertex Al Endpoints.

3 Create a Model Monitoring job that uses training-serving skew as the monitoring objective.

Discussion 0

Questions 41

You work for an organization that operates a streaming music service. You have a custom production model that is serving a "next song" recommendation based on a user’s recent listening history. Your model is deployed on a Vertex Al endpoint. You recently retrained the same model by using fresh data. The model received positive test results offline. You now want to test the new model in production while minimizing complexity. What should you do?

Options:

Create a new Vertex Al endpoint for the new model and deploy the new model to that new endpoint Build a service to randomly send 5% of production traffic to the new endpoint Monitor end-user metrics such as listening time If end-user metrics improve between models over time gradually increase the percentage of production traffic sent to the new endpoint.

Capture incoming prediction requests in BigQuery Create an experiment in Vertex Al Experiments Run batch predictions for both models using the captured data Use the user's selected song to compare the models performance side by side If the new models performance metrics are better than the previous model deploy the new model to production.

Deploy the new model to the existing Vertex Al endpoint Use traffic splitting to send 5% of production traffic to the new model Monitor end-user metrics, such as listening time If end-user metrics improve between models over time, gradually increase the percentage of production traffic sent to the new model.

Configure a model monitoring job for the existing Vertex Al endpoint. Configure the monitoring job to detect prediction drift, and set a threshold for alerts Update the model on the endpoint from the previous model to the new model If you receive an alert of prediction drift, revert to the previous model.

Discussion 0

Questions 42

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?

Options:

Use the class distribution to generate 10% positive examples

Use a convolutional neural network with max pooling and softmax activation

Downsample the data with upweighting to create a sample with 10% positive examples

Remove negative examples until the numbers of positive and negative examples are equal

Discussion 0

Questions 43

You work on an operations team at an international company that manages a large fleet of on-premises servers located in few data centers around the world. Your team collects monitoring data from the servers, including CPU/memory consumption. When an incident occurs on a server, your team is responsible for fixing it. Incident data has not been properly labeled yet. Your management team wants you to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. What should you do first?

Options:

Train a time-series model to predict the machines’ performance values. Configure an alert if a machine’s actual performance values significantly differ from the predicted performance values.

Implement a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict anomalies based on this labeled dataset.

Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Test this heuristic in a production environment.

Hire a team of qualified analysts to review and label the machines’ historical performance data. Train a model based on this manually labeled dataset.

Discussion 0

Questions 44

You are developing a model to identify traffic signs in images extracted from videos taken from the dashboard of a vehicle. You have a dataset of 100 000 images that were cropped to show one out of ten different traffic signs. The images have been labeled accordingly for model training and are stored in a Cloud Storage bucket You need to be able to tune the model during each training run. How should you train the model?

Options:

Train a model for object detection by using Vertex Al AutoML.

Train a model for image classification by using Vertex Al AutoML.

Develop the model training code for object detection and tram a model by using Vertex Al custom training.

Develop the model training code for image classification and train a model by using Vertex Al custom training.

Discussion 0

Questions 45

You are working on a system log anomaly detection model for a cybersecurity organization. You have developed the model using TensorFlow, and you plan to use it for real-time prediction. You need to create a Dataflow pipeline to ingest data via Pub/Sub and write the results to BigQuery. You want to minimize the serving latency as much as possible. What should you do?

Options:

Containerize the model prediction logic in Cloud Run, which is invoked by Dataflow.

Load the model directly into the Dataflow job as a dependency, and use it for prediction.

Deploy the model to a Vertex AI endpoint, and invoke this endpoint in the Dataflow job.

Deploy the model in a TFServing container on Google Kubernetes Engine, and invoke it in the Dataflow job.

Discussion 0

Questions 46

You are developing a recommendation engine for an online clothing store. The historical customer transaction data is stored in BigQuery and Cloud Storage. You need to perform exploratory data analysis (EDA), preprocessing and model training. You plan to rerun these EDA, preprocessing, and training steps as you experiment with different types of algorithms. You want to minimize the cost and development effort of running these steps as you experiment. How should you configure the environment?

Options:

Create a Vertex Al Workbench user-managed notebook using the default VM instance, and use the %%bigquery magic commands in Jupyter to query the tables.

Create a Vertex Al Workbench managed notebook to browse and query the tables directly from the JupyterLab interface.

Create a Vertex Al Workbench user-managed notebook on a Dataproc Hub. and use the %%bigquery magic commands in Jupyter to query the tables.

Create a Vertex Al Workbench managed notebook on a Dataproc cluster, and use the spark-bigquery-connector to access the tables.

Discussion 0

Questions 47

You work for an auto insurance company. You are preparing a proof-of-concept ML application that uses images of damaged vehicles to infer damaged parts Your team has assembled a set of annotated images from damage claim documents in the company's database The annotations associated with each image consist of a bounding box for each identified damaged part and the part name. You have been given a sufficient budget to tram models on Google Cloud You need to quickly create an initial model What should you do?

Options:

Download a pre-trained object detection mode! from TensorFlow Hub Fine-tune the model in Vertex Al Workbench by using the annotated image data.

Train an object detection model in AutoML by using the annotated image data.

Create a pipeline in Vertex Al Pipelines and configure the AutoMLTrainingJobRunOp compon it to train a custom object detection model by using the annotated image data.

Train an object detection model in Vertex Al custom training by using the annotated image data.

Discussion 0

Questions 48

You work for a retailer that sells clothes to customers around the world. You have been tasked with ensuring that ML models are built in a secure manner. Specifically, you need to protect sensitive customer data that might be used in the models. You have identified four fields containing sensitive data that are being used by your data science team: AGE, IS_EXISTING_CUSTOMER, LATITUDE_LONGITUDE, and SHIRT_SIZE. What should you do with the data before it is made available to the data science team for training purposes?

Options:

Tokenize all of the fields using hashed dummy values to replace the real values.

Use principal component analysis (PCA) to reduce the four sensitive fields to one PCA vector.

Coarsen the data by putting AGE into quantiles and rounding LATITUDE_LONGTTUDE into single precision. The other two fields are already as coarse as possible.

Remove all sensitive data fields, and ask the data science team to build their models using non-sensitive data.

Discussion 0

Questions 49

You are an ML engineer at a global car manufacturer. You need to build an ML model to predict car sales in different cities around the world. Which features or feature crosses should you use to train city-specific relationships between car type and number of sales?

Options:

Three individual features binned latitude, binned longitude, and one-hot encoded car type

One feature obtained as an element-wise product between latitude, longitude, and car type

One feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type

Two feature crosses as a element-wise product the first between binned latitude and one-hot encoded car type, and the second between binned longitude and one-hot encoded car type

Discussion 0

Questions 50

You work for a bank with strict data governance requirements. You recently implemented a custom model to detect fraudulent transactions You want your training code to download internal data by using an API endpoint hosted in your projects network You need the data to be accessed in the most secure way, while mitigating the risk of data exfiltration. What should you do?

Options:

Enable VPC Service Controls for peering’s, and add Vertex Al to a service perimeter

Create a Cloud Run endpoint as a proxy to the data Use Identity and Access Management (1AM)

authentication to secure access to the endpoint from the training job.

Configure VPC Peering with Vertex Al and specify the network of the training job

Download the data to a Cloud Storage bucket before calling the training job

Discussion 0

Questions 51

You are training a TensorFlow model on a structured data set with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?

Options:

Load the data into BigQuery and read the data from BigQuery.

Load the data into Cloud Bigtable, and read the data from Bigtable

Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage

Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS)

Discussion 0

Questions 52

You have recently developed a new ML model in a Jupyter notebook. You want to establish a reliable and repeatable model training process that tracks the versions and lineage of your model artifacts. You plan to retrain your model weekly. How should you operationalize your training process?

Options:

1. Create an instance of the CustomTrainingJob class with the Vertex AI SDK to train your model.

2. Using the Notebooks API, create a scheduled execution to run the training code weekly.

1. Create an instance of the CustomJob class with the Vertex AI SDK to train your model.

2. Use the Metadata API to register your model as a model artifact.

3. Using the Notebooks API, create a scheduled execution to run the training code weekly.

1. Create a managed pipeline in Vertex Al Pipelines to train your model by using a Vertex Al CustomTrainingJoOp component.

2. Use the ModelUploadOp component to upload your model to Vertex Al Model Registry.

3. Use Cloud Scheduler and Cloud Functions to run the Vertex Al pipeline weekly.

1. Create a managed pipeline in Vertex Al Pipelines to train your model using a Vertex Al HyperParameterTuningJobRunOp component.

2. Use the ModelUploadOp component to upload your model to Vertex Al Model Registry.

3. Use Cloud Scheduler and Cloud Functions to run the Vertex Al pipeline weekly.

Discussion 0

Questions 53

You are working on a prototype of a text classification model in a managed Vertex AI Workbench notebook. You want to quickly experiment with tokenizing text by using a Natural Language Toolkit (NLTK) library. How should you add the library to your Jupyter kernel?

Options:

Install the NLTK library from a terminal by using the pip install nltk command.

Write a custom Dataflow job that uses NLTK to tokenize your text and saves the output to Cloud Storage.

Create a new Vertex Al Workbench notebook with a custom image that includes the NLTK library.

Install the NLTK library from a Jupyter cell by using the! pip install nltk —user command.

Discussion 0

Questions 54

Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. Which environment should you train your model on?

Options:

AVM on Compute Engine and 1 TPU with all dependencies installed manually.

AVM on Compute Engine and 8 GPUs with all dependencies installed manually.

A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.

A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.

Discussion 0

Questions 55

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-managed notebooks. You use BigQuery to split your data into training and validation sets using the following queries:

CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS

(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.8);

CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS

(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.2);

After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

Options:

There is training-serving skew in your production environment.

There is not a sufficient amount of training data.

The tables that you created to hold your training and validation records share some records, and you may not be using all the data in your initial table.

The RAND() function generated a number that is less than 0.2 in both instances, so every record in the validation table will also be in the training table.

Discussion 0

Questions 56

You work for a retail company. You have created a Vertex Al forecast model that produces monthly item sales predictions. You want to quickly create a report that will help to explain how the model calculates the predictions. You have one month of recent actual sales data that was not included in the training dataset. How should you generate data for your report?

Options:

Create a batch prediction job by using the actual sales data Compare the predictions to the actuals in the report.

Create a batch prediction job by using the actual sates data and configure the job settings to generate feature attributions. Compare the results in the report.

Generate counterfactual examples by using the actual sales data Create a batch prediction job using the

actual sales data and the counterfactual examples Compare the results in the report.

Train another model by using the same training dataset as the original and exclude some columns. Using the actual sales data create one batch prediction job by using the new model and another one with the original model Compare the two sets of predictions in the report.

Discussion 0

Questions 57

You work for a bank You have been asked to develop an ML model that will support loan application decisions. You need to determine which Vertex Al services to include in the workflow You want to track the model's training parameters and the metrics per training epoch. You plan to compare the performance of each version of the model to determine the best model based on your chosen metrics. Which Vertex Al services should you use?

Options:

Vertex ML Metadata Vertex Al Feature Store, and Vertex Al Vizier

Vertex Al Pipelines. Vertex Al Experiments, and Vertex Al Vizier

Vertex ML Metadata Vertex Al Experiments, and Vertex Al TensorBoard

Vertex Al Pipelines. Vertex Al Feature Store, and Vertex Al TensorBoard

Discussion 0

Questions 58

You work at a gaming startup that has several terabytes of structured data in Cloud Storage. This data includes gameplay time data user metadata and game metadata. You want to build a model that recommends new games to users that requires the least amount of coding. What should you do?

Options:

Load the data in BigQuery Use BigQuery ML to tram an Autoencoder model.

Load the data in BigQuery Use BigQuery ML to train a matrix factorization model.

Read data to a Vertex Al Workbench notebook Use TensorFlow to train a two-tower model.

Read data to a Vertex AI Workbench notebook Use TensorFlow to train a matrix factorization model.

Discussion 0

Questions 59

You are using Keras and TensorFlow to develop a fraud detection model Records of customer transactions are stored in a large table in BigQuery. You need to preprocess these records in a cost-effective and efficient way before you use them to train the model. The trained model will be used to perform batch inference in BigQuery. How should you implement the preprocessing workflow?

Options:

Implement a preprocessing pipeline by using Apache Spark, and run the pipeline on Dataproc Save the preprocessed data as CSV files in a Cloud Storage bucket.

Load the data into a pandas DataFrame Implement the preprocessing steps using panda’s transformations. and train the model directly on the DataFrame.

Perform preprocessing in BigQuery by using SQL Use the BigQueryClient in TensorFlow to read the data directly from BigQuery.

Implement a preprocessing pipeline by using Apache Beam, and run the pipeline on Dataflow Save the preprocessed data as CSV files in a Cloud Storage bucket.

Discussion 0

Questions 60

You recently deployed a pipeline in Vertex Al Pipelines that trains and pushes a model to a Vertex Al endpoint to serve real-time traffic. You need to continue experimenting and iterating on your pipeline to improve model performance. You plan to use Cloud Build for CI/CD You want to quickly and easily deploy new pipelines into production and you want to minimize the chance that the new pipeline implementations will break in production. What should you do?

Options:

Set up a CI/CD pipeline that builds and tests your source code If the tests are successful use the Google Cloud console to upload the built container to Artifact Registry and upload the compiled pipeline to Vertex Al Pipelines.

Set up a CI/CD pipeline that builds your source code and then deploys built artifacts into a pre-production environment Run unit tests in the pre-production environment If the tests are successful deploy the pipeline to production.

Set up a CI/CD pipeline that builds and tests your source code and then deploys built artifacts into a pre-production environment. After a successful pipeline run in the pre-production environment deploy the pipeline to production

Set up a CI/CD pipeline that builds and tests your source code and then deploys built arrets into a pre-production environment After a successful pipeline run in the pre-production environment, rebuild the source code, and deploy the artifacts to production

Discussion 0

Questions 61

You are building a MLOps platform to automate your company's ML experiments and model retraining. You need to organize the artifacts for dozens of pipelines How should you store the pipelines' artifacts'?

Options:

Store parameters in Cloud SQL and store the models' source code and binaries in GitHub

Store parameters in Cloud SQL store the models' source code in GitHub, and store the models' binaries in Cloud Storage.

Store parameters in Vertex ML Metadata store the models' source code in GitHub and store the models' binaries in Cloud Storage.

Store parameters in Vertex ML Metadata and store the models source code and binaries in GitHub.

Discussion 0

Questions 62

You work for a company that captures live video footage of checkout areas in their retail stores You need to use the live video footage to build a mode! to detect the number of customers waiting for service in near real time You want to implement a solution quickly and with minimal effort How should you build the model?

Options:

Use the Vertex Al Vision Occupancy Analytics model.

Use the Vertex Al Vision Person/vehicle detector model

Train an AutoML object detection model on an annotated dataset by using Vertex AutoML

Train a Seq2Seq+ object detection model on an annotated dataset by using Vertex AutoML

Discussion 0

Questions 63

You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop Model training will use a large batch size, and you expect training to take several weeks You need to configure a training architecture that minimizes both training time and compute costs What should you do?

Options:

Discussion 0

Questions 64

You are building a custom image classification model and plan to use Vertex Al Pipelines to implement the end-to-end training. Your dataset consists of images that need to be preprocessed before they can be used to train the model. The preprocessing steps include resizing the images, converting them to grayscale, and extracting features. You have already implemented some Python functions for the preprocessing tasks. Which components should you use in your pipeline'?

Options:

Discussion 0

Questions 65

You work with a team of researchers to develop state-of-the-art algorithms for financial analysis. Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of debugging while also reducing the model training time. How should you set up your training environment?

Options:

Configure a v3-8 TPU VM SSH into the VM to tram and debug the model.

Configure a v3-8 TPU node Use Cloud Shell to SSH into the Host VM to train and debug the model.

Configure a M-standard-4 VM with 4 NVIDIA P100 GPUs SSH into the VM and use

Parameter Server Strategy to train the model.

Configure a M-standard-4 VM with 4 NVIDIA P100 GPUs SSH into the VM and use

MultiWorkerMirroredStrategy to train the model.

Discussion 0

Questions 66

You are analyzing customer data for a healthcare organization that is stored in Cloud Storage. The data contains personally identifiable information (PII) You need to perform data exploration and preprocessing while ensuring the security and privacy of sensitive fields What should you do?

Options:

Use the Cloud Data Loss Prevention (DLP) API to de-identify the PI! before performing data exploration and preprocessing.

Use customer-managed encryption keys (CMEK) to encrypt the Pll data at rest and decrypt the Pll data during data exploration and preprocessing.

Use a VM inside a VPC Service Controls security perimeter to perform data exploration and preprocessing.

Use Google-managed encryption keys to encrypt the Pll data at rest, and decrypt the Pll data during data exploration and preprocessing.

Discussion 0

Questions 67

You created a model that uses BigQuery ML to perform linear regression. You need to retrain the model on the cumulative data collected every week. You want to minimize the development effort and the scheduling cost. What should you do?

Options:

Use BigQuerys scheduling service to run the model retraining query periodically.

Create a pipeline in Vertex Al Pipelines that executes the retraining query and use the Cloud Scheduler API to run the query weekly.

Use Cloud Scheduler to trigger a Cloud Function every week that runs the query for retraining the model.

Use the BigQuery API Connector and Cloud Scheduler to trigger. Workflows every week that retrains the model.

Discussion 0

Questions 68

You work for a company that sells corporate electronic products to thousands of businesses worldwide. Your company stores historical customer data in BigQuery. You need to build a model that predicts customer lifetime value over the next three years. You want to use the simplest approach to build the model and you want to have access to visualization tools. What should you do?

Options:

Create a Vertex Al Workbench notebook to perform exploratory data analysis. Use IPython magics to create a new BigQuery table with input features Use the BigQuery console to run the create model statement Validate the results by using the ml. evaluate and ml. predict statements.

Run the create model statement from the BigQuery console to create an AutoML model Validate the results by using the ml. evaluate and ml. predict statements.

Create a Vertex Al Workbench notebook to perform exploratory data analysis and create input features Save the features as a CSV file in Cloud Storage Import the CSV file as a new BigQuery table Use the BigQuery console to run the create model statement Validate the results by using the ml. evaluate and ml. predict statements.

Create a Vertex Al Workbench notebook to perform exploratory data analysis Use IPython magics to create a new BigQuery table with input features, create the model and validate the results by using the create model, ml. evaluates, and ml. predict statements.

Discussion 0

Questions 69

You work at a mobile gaming startup that creates online multiplayer games Recently, your company observed an increase in players cheating in the games, leading to a loss of revenue and a poor user experience. You built a binary classification model to determine whether a player cheated after a completed game session, and then send a message to other downstream systems to ban the player that cheated Your model has performed well during testing, and you now need to deploy the model to production You want your serving solution to provide immediate classifications after a completed game session to avoid further loss of revenue. What should you do?

Options:

Import the model into Vertex Al Model Registry. Use the Vertex Batch Prediction service to run batch inference jobs.

Save the model files in a Cloud Storage Bucket Create a Cloud Function to read the model files and make online inference requests on the Cloud Function.

Save the model files in a VM Load the model files each time there is a prediction request and run an inference job on the VM.

Import the model into Vertex Al Model Registry Create a Vertex Al endpoint that hosts the model and make online inference requests.

Discussion 0

Questions 70

You have a custom job that runs on Vertex Al on a weekly basis The job is Implemented using a proprietary ML workflow that produces the datasets. models, and custom artifacts, and sends them to a Cloud Storage bucket Many different versions of the datasets and models were created Due to compliance requirements, your company needs to track which model was used for making a particular prediction, and needs access to the artifacts for each model. How should you configure your workflows to meet these requirement?

Options:

Configure a TensorFlow Extended (TFX) ML Metadata database, and use the ML Metadata API.

Create a Vertex Al experiment, and enable autologging inside the custom job

Use the Vertex Al Metadata API inside the custom Job to create context, execution, and artifacts for each model, and use events to link them together.

Register each model in Vertex Al Model Registry, and use model labels to store the related dataset and model information.

Discussion 0

Questions 71

You work at a gaming startup that has several terabytes of structured data in Cloud Storage. This data includes gameplay time data, user metadata, and game metadata. You want to build a model that recommends new games to users that requires the least amount of coding. What should you do?

Options:

Load the data in BigQuery. Use BigQuery ML to train an Autoencoder model.

Load the data in BigQuery. Use BigQuery ML to train a matrix factorization model.

Read data to a Vertex Al Workbench notebook. Use TensorFlow to train a two-tower model.

Read data to a Vertex Al Workbench notebook. Use TensorFlow to train a matrix factorization model.

Discussion 0

Questions 72

As the lead ML Engineer for your company, you are responsible for building ML models to digitize scanned customer forms. You have developed a TensorFlow model that converts the scanned images into text and stores them in Cloud Storage. You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention. What should you do?

Options:

Use the batch prediction functionality of Al Platform

Create a serving pipeline in Compute Engine for prediction

Use Cloud Functions for prediction each time a new data point is ingested

Deploy the model on Al Platform and create a version of it for online inference.

Discussion 0

Questions 73

You work for a company that is developing an application to help users with meal planning You want to use machine learning to scan a corpus of recipes and extract each ingredient (e g carrot, rice pasta) and each kitchen cookware (e.g. bowl, pot spoon) mentioned Each recipe is saved in an unstructured text file What should you do?

Options:

Create a text dataset on Vertex Al for entity extraction Create two entities called ingredient" and cookware" and label at least 200 examples of each entity Train an AutoML entity extraction model to extract occurrences of these entity types Evaluate performance on a holdout dataset.

Create a multi-label text classification dataset on Vertex Al Create a test dataset and label each recipe that corresponds to its ingredients and cookware Train a multi-class classification model Evaluate the model’s performance on a holdout dataset.

Use the Entity Analysis method of the Natural Language API to extract the ingredients and cookware from each recipe Evaluate the model's performance on a prelabeled dataset.

Create a text dataset on Vertex Al for entity extraction Create as many entities as there are different ingredients and cookware Train an AutoML entity extraction model to extract those entities Evaluate the models performance on a holdout dataset.

Discussion 0

Questions 74

You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (Pll). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?

Options:

Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.

Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption

Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.

Before training, use BigQuery to select only the columns that do not contain sensitive data Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.

Discussion 0

Questions 75

You work for a retail company. You have been tasked with building a model to determine the probability of churn for each customer. You need the predictions to be interpretable so the results can be used to develop marketing campaigns that target at-risk customers. What should you do?

Options:

Build a random forest regression model in a Vertex Al Workbench notebook instance Configure the model to generate feature importance’s after the model is trained.

Build an AutoML tabular regression model Configure the model to generate explanations when it makes predictions.

Build a custom TensorFlow neural network by using Vertex Al custom training Configure the model to generate explanations when it makes predictions.

Build a random forest classification model in a Vertex Al Workbench notebook instance Configure the model to generate feature importance’s after the model is trained.

Discussion 0

Questions 76

You developed a BigQuery ML linear regressor model by using a training dataset stored in a BigQuery table. New data is added to the table every minute. You are using Cloud Scheduler and Vertex Al Pipelines to automate hourly model training, and use the model for direct inference. The feature preprocessing logic includes quantile bucketization and MinMax scaling on data received in the last hour. You want to minimize storage and computational overhead. What should you do?

Options:

Create a component in the Vertex Al Pipelines directed acyclic graph (DAG) to calculate the required statistics, and pass the statistics on to subsequent components.

Preprocess and stage the data in BigQuery prior to feeding it to the model during training and inference.

Create SQL queries to calculate and store the required statistics in separate BigQuery tables that are referenced in the CREATE MODEL statement.

Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics.

Discussion 0

Questions 77

You are building an ML model to predict trends in the stock market based on a wide range of factors. While exploring the data, you notice that some features have a large range. You want to ensure that the features with the largest magnitude don’t overfit the model. What should you do?

Options:

Standardize the data by transforming it with a logarithmic function.

Apply a principal component analysis (PCA) to minimize the effect of any particular feature.

Use a binning strategy to replace the magnitude of each feature with the appropriate bin number.

Normalize the data by scaling it to have values between 0 and 1.

Discussion 0

Questions 78

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?

Options:

Increase the instance memory to 512 GB and increase the batch size.

Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.

Enable early stopping in your Vertex AI Training job.

Use the tf.distribute.Strategy API and run a distributed training job.

Discussion 0

Questions 79

You are developing an ML model in a Vertex Al Workbench notebook. You want to track artifacts and compare models during experimentation using different approaches. You need to rapidly and easily transition successful experiments to production as you iterate on your model implementation. What should you do?

Options:

1 Initialize the Vertex SDK with the name of your experiment Log parameters and metrics for each experiment, and attach dataset and model artifacts as inputs and outputs to each execution.

2 After a successful experiment create a Vertex Al pipeline.

1. Initialize the Vertex SDK with the name of your experiment Log parameters and metrics for each experiment, save your dataset to a Cloud Storage bucket and upload the models to Vertex Al Model Registry.

2 After a successful experiment create a Vertex Al pipeline.

1 Create a Vertex Al pipeline with parameters you want to track as arguments to your Pipeline Job Use the Metrics. Model, and Dataset artifact types from the Kubeflow Pipelines DSL as the inputs and outputs of the components in your pipeline.

2. Associate the pipeline with your experiment when you submit the job.

1 Create a Vertex Al pipeline Use the Dataset and Model artifact types from the Kubeflow Pipelines. DSL as the inputs and outputs of the components in your pipeline.

2. In your training component use the Vertex Al SDK to create an experiment run Configure the log_params and log_metrics functions to track parameters and metrics of your experiment.

Discussion 0

Questions 80

You developed a Vertex Al ML pipeline that consists of preprocessing and training steps and each set of steps runs on a separate custom Docker image Your organization uses GitHub and GitHub Actions as CI/CD to run unit and integration tests You need to automate the model retraining workflow so that it can be initiated both manually and when a new version of the code is merged in the main branch You want to minimize the steps required to build the workflow while also allowing for maximum flexibility How should you configure the CI/CD workflow?

Options:

Trigger a Cloud Build workflow to run tests build custom Docker images, push the images to Artifact Registry and launch the pipeline in Vertex Al Pipelines.

Trigger GitHub Actions to run the tests launch a job on Cloud Run to build custom Docker images push the images to Artifact Registry and launch the pipeline in Vertex Al Pipelines.

Trigger GitHub Actions to run the tests build custom Docker images push the images to Artifact Registry, and launch the pipeline in Vertex Al Pipelines.

Trigger GitHub Actions to run the tests launch a Cloud Build workflow to build custom Dicker images, push the images to Artifact Registry, and launch the pipeline in Vertex Al Pipelines.

Discussion 0

Labour Day Special 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

examsbrite logo

Navigation:

Google Professional Machine Learning Engineer Professional-Machine-Learning-Engineer Exam Questions with Experts Answers Updated Recently

Google Professional Machine Learning Engineer Question and Answers

Professional-Machine-Learning-Engineer PDF

Professional-Machine-Learning-Engineer Testing Engine

Professional-Machine-Learning-Engineer PDF + Testing Engine

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Answer:

Explanation:

Options:

Options:

Answer:

Explanation:

Options:

Options:

Options:

Options:

Options:

Options:

Answer:

Explanation:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options: