Weekend Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

ExamsBrite Dumps

AWS Certified Machine Learning Engineer - Associate Question and Answers

AWS Certified Machine Learning Engineer - Associate

Last Update Feb 6, 2026
Total Questions : 207

We are offering FREE MLA-C01 Amazon Web Services exam questions. All you do is to just go and sign up. Give your details, prepare MLA-C01 free exam questions and then go for complete pool of AWS Certified Machine Learning Engineer - Associate test questions that will help you more.

MLA-C01 pdf

MLA-C01 PDF

$36.75  $104.99
MLA-C01 Engine

MLA-C01 Testing Engine

$43.75  $124.99
MLA-C01 PDF + Engine

MLA-C01 PDF + Testing Engine

$57.75  $164.99
Questions 1

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Options:

A.  

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

B.  

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

C.  

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.  

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Discussion 0
Questions 2

A company has deployed a model to predict the churn rate for its games by using Amazon SageMaker Studio. After the model is deployed, the company must monitor the model performance for data drift and inspect the report. Select and order the correct steps from the following list to model monitor actions. Select each step one time. (Select and order THREE.) .

Check the analysis results on the SageMaker Studio console. .

Create a Shapley Additive Explanations (SHAP) baseline for the model by using Amazon SageMaker Clarify.

Schedule an hourly model explainability monitor.

Options:

Discussion 0
Questions 3

A company runs its ML workflows on an on-premises Kubernetes cluster. The ML workflows include ML services that perform training and inferences for ML models. Each ML service runs from its own standalone Docker image.

The company needs to perform a lift and shift from the on-premises Kubernetes cluster to an Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

A.  

Redesign the ML services to be configured in Kubeflow. Deploy the new Kubeflow managed ML services to the EKS cluster.

B.  

Upload the Docker images to an Amazon Elastic Container Registry (Amazon ECR) repository. Configure a deployment pipeline to deploy the images to the EKS cluster.

C.  

Migrate the training data to an Amazon Redshift cluster. Retrain the models from the migrated training data by using Amazon Redshift ML. Deploy the retrained models to the EKS cluster.

D.  

Configure an Amazon SageMaker AI notebook. Retrain the models with the same code. Deploy the retrained models to the EKS cluster.

Discussion 0
Questions 4

An ML engineer is using Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect very high or very low machine operating temperatures compared to normal. The ML engineer sets the Severity parameter to Low and above. The ML engineer sets the Direction parameter to All.

What effect will the ML engineer observe in the anomaly detection results if the ML engineer changes the Direction parameter to Lower than expected?

Options:

A.  

Increased anomaly identification frequency and increased recall

B.  

Decreased anomaly identification frequency and decreased recall

C.  

Increased anomaly identification frequency and decreased recall

D.  

Decreased anomaly identification frequency and increased recall

Discussion 0
Questions 5

A company regularly receives new training data from a vendor of an ML model. The vendor delivers cleaned and prepared data to the company’s Amazon S3 bucket every 3–4 days.

The company has an Amazon SageMaker AI pipeline to retrain the model. An ML engineer needs to run the pipeline automatically when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.  

Create an S3 lifecycle rule to transfer the data to the SageMaker AI training instance and initiate training.

B.  

Create an AWS Lambda function that scans the S3 bucket and initiates the pipeline when new data is uploaded.

C.  

Create an Amazon EventBridge rule that matches S3 upload events and configures the SageMaker pipeline as the target.

D.  

Use Amazon Managed Workflows for Apache Airflow (MWAA) to orchestrate the pipeline when new data is uploaded.

Discussion 0
Questions 6

A company uses an ML model to recommend videos to users. The model is deployed on Amazon SageMaker AI. The model performed well initially after deployment, but the model's performance has degraded over time.

Which solution can the company use to identify model drift in the future?

Options:

A.  

Create a monitoring job in SageMaker Model Monitor. Then create a baseline from the training dataset.

B.  

Create a baseline from the training dataset. Then create a monitoring job in SageMaker Model Monitor.

C.  

Create a baseline by using a built-in rule in SageMaker Clarify. Monitor the drift in Amazon CloudWatch.

D.  

Retrain the model on new data. Compare the retrained model's performance to the original model's performance.

Discussion 0
Questions 7

A company is using Amazon SageMaker and millions of files to train an ML model. Each file is several megabytes in size. The files are stored in an Amazon S3 bucket. The company needs to improve training performance.

Which solution will meet these requirements in the LEAST amount of time?

Options:

A.  

Transfer the data to a new S3 bucket that provides S3 Express One Zone storage. Adjust the training job to use the new S3 bucket.

B.  

Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system.

C.  

Create an Amazon Elastic File System (Amazon EFS) file system. Transfer the existing data to the file system. Adjust the training job to read from the file system.

D.  

Create an Amazon ElastiCache (Redis OSS) cluster. Link the Redis OSS cluster to the existing S3 bucket. Stream the data from the Redis OSS cluster directly to the training job.

Discussion 0
Questions 8

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model's performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model's final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

Options:

Discussion 0
Questions 9

A company must install a custom script on any newly created Amazon SageMaker AI notebook instances.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

A.  

Create a lifecycle configuration script to install the custom script when a new SageMaker AI notebook is created. Attach the lifecycle configuration to every new SageMaker AI notebook as part of the creation steps.

B.  

Create a custom Amazon Elastic Container Registry (Amazon ECR) image that contains the custom script. Push the ECR image to a Docker registry. Attach the Docker image to a SageMaker Studio domain. Select the kernel to run as part of the SageMaker AI notebook.

C.  

Create a custom package index repository. Use AWS CodeArtifact to manage the installation of the custom script. Set up AWS PrivateLink endpoints to connect CodeArtifact to the SageMaker AI instance. Install the script.

D.  

Store the custom script in Amazon S3. Create an AWS Lambda function to install the custom script on new SageMaker AI notebooks. Configure Amazon EventBridge to invoke the Lambda function when a new SageMaker AI notebook is initialized.

Discussion 0
Questions 10

A digital media entertainment company needs real-time video content moderation to ensure compliance during live streaming events.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use Amazon Rekognition and AWS Lambda to extract and analyze the metadata from the videos' image frames.

B.  

Use Amazon Rekognition and a large language model (LLM) hosted on Amazon Bedrock to extract and analyze the metadata from the videos’ image frames.

C.  

Use Amazon SageMaker AI to extract and analyze the metadata from the videos' image frames.

D.  

Use Amazon Transcribe and Amazon Comprehend to extract and analyze the metadata from the videos' image frames.

Discussion 0
Questions 11

A company needs to combine data from multiple sources. The company must use Amazon Redshift Serverless to query an AWS Glue Data Catalog database and underlying data that is stored in an Amazon S3 bucket.

Select and order the correct steps from the following list to meet these requirements. Select each step one time or not at all. (Select and order three.)

• Attach the IAM role to the Redshift cluster.

• Attach the IAM role to the Redshift namespace.

• Create an external database in Amazon Redshift to point to the Data Catalog schema.

• Create an external schema in Amazon Redshift to point to the Data Catalog database.

• Create an IAM role for Amazon Redshift to use to access only the S3 bucket that contains underlying data.

• Create an IAM role for Amazon Redshift to use to access the Data Catalog and the S3 bucket that contains underlying data.

Options:

Discussion 0
Questions 12

A company is developing an ML model to forecast future values based on time series data. The dataset includes historical measurements collected at regular intervals and categorical features. The model needs to predict future values based on past patterns and trends.

Which algorithm and hyperparameters should the company use to develop the model?

Options:

A.  

Use the Amazon SageMaker AI XGBoost algorithm. Set the scale_pos_weight hyperparameter to adjust for class imbalance.

B.  

Use k-means clustering with k to specify the number of clusters.

C.  

Use the Amazon SageMaker AI DeepAR algorithm with matching context length and prediction length hyperparameters.

D.  

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm with contamination to set the expected proportion of anomalies.

Discussion 0
Questions 13

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.

Which solution will meet this requirement?

Options:

A.  

Configure the competitor's name as a blocked phrase in Amazon Q Business.

B.  

Configure an Amazon Q Business retriever to exclude the competitor's name.

C.  

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

D.  

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Discussion 0
Questions 14

A company uses AWS CodePipeline to orchestrate a continuous integration and continuous delivery (CI/CD) pipeline for ML models and applications.

Select and order the steps from the following list to describe a CI/CD process for a successful deployment. Select each step one time. (Select and order FIVE.)

. CodePipeline deploys ML models and applications to production.

· CodePipeline detects code changes and starts to build automatically.

. Human approval is provided after testing is successful.

. The company builds and deploys ML models and applications to staging servers for testing.

. The company commits code changes or new training datasets to a Git repository.

Options:

Discussion 0
Questions 15

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of data quality for the models and must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

A.  

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and send alerts.

B.  

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and send alerts.

C.  

Deploy the models by using Amazon ECS on AWS Fargate. Use Amazon EventBridge to monitor the data quality and send alerts.

D.  

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and send alerts.

Discussion 0
Questions 16

An ML engineer is using Amazon SageMaker Canvas to build a custom ML model from an imported dataset. The model must make continuous numeric predictions based on 10 years of data.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

A.  

Accuracy

B.  

InferenceLatency

C.  

Area Under the ROC Curve (AUC)

D.  

Root Mean Square Error (RMSE)

Discussion 0
Questions 17

A company is developing an ML model by using Amazon SageMaker AI. The company must monitor bias in the model and display the results on a dashboard. An ML engineer creates a bias monitoring job.

How should the ML engineer capture bias metrics to display on the dashboard?

Options:

A.  

Capture AWS CloudTrail metrics from SageMaker Clarify.

B.  

Capture Amazon CloudWatch metrics from SageMaker Clarify.

C.  

Capture SageMaker Model Monitor metrics from Amazon EventBridge.

D.  

Capture SageMaker Model Monitor metrics from Amazon SNS.

Discussion 0
Questions 18

A company needs to run a batch data-processing job on Amazon EC2 instances. The job will run during the weekend and will take 90 minutes to finish running. The processing can handle interruptions. The company will run the job every weekend for the next 6 months.

Which EC2 instance purchasing option will meet these requirements MOST cost-effectively?

Options:

A.  

Spot Instances

B.  

Reserved Instances

C.  

On-Demand Instances

D.  

Dedicated Instances

Discussion 0
Questions 19

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

A.  

Accuracy

B.  

Area Under the ROC Curve (AUC)

C.  

F1 score

D.  

Mean absolute error (MAE)

Discussion 0
Questions 20

A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months.

Which solution will meet this requirement?

Options:

A.  

Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation.

B.  

Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation.

C.  

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation.

D.  

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation.

Discussion 0
Questions 21

A company needs to create a central catalog for all the company's ML models. The models are in AWS accounts where the company developed the models initially. The models are hosted in Amazon Elastic Container Registry (Amazon ECR) repositories.

Which solution will meet these requirements?

Options:

A.  

Configure ECR cross-account replication for each existing ECR repository. Ensure that each model is visible in each AWS account.

B.  

Create a new AWS account with a new ECR repository as the central catalog. Configure ECR cross-account replication between the initial ECR repositories and the central catalog.

C.  

Use the Amazon SageMaker Model Registry to create a model group for models hosted in Amazon ECR. Create a new AWS account. In the new account, use the SageMaker Model Registry as the central catalog. Attach a cross-account resource policy to each model group in the initial AWS accounts.

D.  

Use an AWS Glue Data Catalog to store the models. Run an AWS Glue crawler to migrate the models from the ECR repositories to the Data Catalog. Configure cross-account access to the Data Catalog.

Discussion 0
Questions 22

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.

Which file format will meet these requirements?

Options:

A.  

CSV files compressed with Snappy

B.  

JSON objects in JSONL format

C.  

JSON files compressed with gzip

D.  

Apache Parquet files

Discussion 0
Questions 23

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company's internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

Options:

A.  

Use Amazon Kendra Experience Builder.

B.  

Use Amazon Aurora PostgreSQL with the pgvector extension.

C.  

Use Amazon RDS for PostgreSQL with the pgvector extension.

D.  

Use the AWS Glue Data Catalog metadata repository.

Discussion 0
Questions 24

An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.

Which instance purchasing option will meet these requirements MOST cost-effectively?

Options:

A.  

Run the primary node, core nodes, and task nodes on On-Demand Instances.

B.  

Run the primary node, core nodes, and task nodes on Spot Instances.

C.  

Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

D.  

Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Discussion 0
Questions 25

A company is developing an application that reads animal descriptions from user prompts and generates images based on the information in the prompts. The application reads a message from an Amazon Simple Queue Service (Amazon SQS) queue. Then the application uses Amazon Titan Image Generator on Amazon Bedrock to generate an image based on the information in the message. Finally, the application removes the message from SQS queue.

Which IAM permissions should the company assign to the application's IAM role? (Select TWO.)

Options:

A.  

Allow the bedrock:InvokeModel action for the Amazon Titan Image Generator resource.

B.  

Allow the bedrock:Get* action for the Amazon Titan Image Generator resource.

C.  

Allow the sqs:ReceiveMessage action and the sqs:DeleteMessage action for the SQS queue resource.

D.  

Allow the sqs:GetQueueAttributes action and the sqs:DeleteMessage action for the SQS queue resource.

E.  

Allow the sagemaker:PutRecord* action for the Amazon Titan Image Generator resource.

Discussion 0
Questions 26

An ML engineer needs to use Amazon SageMaker to fine-tune a large language model (LLM) for text summarization. The ML engineer must follow a low-code no-code (LCNC) approach.

Which solution will meet these requirements?

Options:

A.  

Use SageMaker Studio to fine-tune an LLM that is deployed on Amazon EC2 instances.

B.  

Use SageMaker Autopilot to fine-tune an LLM that is deployed by a custom API endpoint.

C.  

Use SageMaker Autopilot to fine-tune an LLM that is deployed on Amazon EC2 instances.

D.  

Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart.

Discussion 0
Questions 27

A company is developing a generative AI conversational interface to assist customers with payments. The company wants to use an ML solution to detect customer intent. The company does not have training data to train a model.

Which solution will meet these requirements?

Options:

A.  

Fine-tune a sequence-to-sequence (seq2seq) algorithm in Amazon SageMaker JumpStart.

B.  

Use an LLM from Amazon Bedrock with zero-shot learning.

C.  

Use the Amazon Comprehend DetectEntities API.

D.  

Run an LLM from Amazon Bedrock on Amazon EC2 instances.

Discussion 0
Questions 28

An ML engineer needs to organize a large set of text documents into topics. The ML engineer will not know what the topics are in advance. The ML engineer wants to use built-in algorithms or pre-trained models available through Amazon SageMaker AI to process the documents.

Which solution will meet these requirements?

Options:

A.  

Use the BlazingText algorithm to identify the relevant text and to create a set of topics based on the documents.

B.  

Use the Sequence-to-Sequence algorithm to summarize the text and to create a set of topics based on the documents.

C.  

Use the Object2Vec algorithm to create embeddings and to create a set of topics based on the embeddings.

D.  

Use the Latent Dirichlet Allocation (LDA) algorithm to process the documents and to create a set of topics based on the documents.

Discussion 0
Questions 29

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a retraining job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

A.  

Use an AWS Glue crawler and an AWS Glue ETL job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.  

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the retraining job.

C.  

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the retraining job.

D.  

Use Amazon QuickSight anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the retraining job.

Discussion 0
Questions 30

A company is gathering audio, video, and text data in various languages. The company needs to use a large language model (LLM) to summarize the gathered data that is in Spanish.

Which solution will meet these requirements in the LEAST amount of time?

Options:

A.  

Train and deploy a model in Amazon SageMaker to convert the data into English text. Train and deploy an LLM in SageMaker to summarize the text.

B.  

Use Amazon Transcribe and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Jurassic model to summarize the text.

C.  

Use Amazon Rekognition and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Anthropic Claude model to summarize the text.

D.  

Use Amazon Comprehend and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Stable Diffusion model to summarize the text.

Discussion 0
Questions 31

A company needs to ingest data from data sources into Amazon SageMaker Data Wrangler. The data sources are Amazon S3, Amazon Redshift, and Snowflake. The ingested data must always be up to date with the latest changes in the source systems.

Which solution will meet these requirements?

Options:

A.  

Use direct connections to import data from the data sources into Data Wrangler.

B.  

Use cataloged connections to import data from the data sources into Data Wrangler.

C.  

Use AWS Glue to extract data from the data sources. Use AWS Glue also to import the data directly into Data Wrangler.

D.  

Use AWS Lambda to extract data from the data sources. Use Lambda also to import the data directly into Data Wrangler.

Discussion 0
Questions 32

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data.

Which solution will meet this requirement with the LEAST operational effort?

Options:

A.  

Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly.

B.  

Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset.

C.  

Use AWS Glue DataBrew built-in features to oversample the minority class.

D.  

Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.

Discussion 0
Questions 33

An ML engineer decides to use Amazon SageMaker AI automated model tuning (AMT) for hyperparameter optimization (HPO). The ML engineer requires a tuning strategy that uses regression to slowly and sequentially select the next set of hyperparameters based on previous runs. The strategy must work across small hyperparameter ranges.

Which solution will meet these requirements?

Options:

A.  

Grid search

B.  

Random search

C.  

Bayesian optimization

D.  

Hyperband

Discussion 0
Questions 34

An ML engineer is analyzing a classification dataset before training a model in Amazon SageMaker AI. The ML engineer suspects that the dataset has a significant imbalance between class labels that could lead to biased model predictions. To confirm class imbalance, the ML engineer needs to select an appropriate pre-training bias metric.

Which metric will meet this requirement?

Options:

A.  

Mean squared error (MSE)

B.  

Difference in proportions of labels (DPL)

C.  

Silhouette score

D.  

Structural similarity index measure (SSIM)

Discussion 0
Questions 35

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

Options:

A.  

Apply principal component analysis (PCA) to oversample the minority class in the training dataset.

B.  

Apply Synthetic Minority Oversampling Technique (SMOTE) to generate new synthetic samples of the minority class in the training dataset.

C.  

Randomly oversample the majority class in the validation dataset.

D.  

Apply k-means clustering to undersample the minority class in the test dataset.

Discussion 0
Questions 36

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company's ML engineers are assigned to specific advertisement campaigns.

The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns.

Which solution will meet these requirements in the MOST operationally efficient way?

Options:

A.  

Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers' campaigns.

B.  

Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

C.  

Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

D.  

Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers' campaigns.

Discussion 0
Questions 37

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.

Which algorithm should the ML engineer use to meet this requirement?

Options:

A.  

LightGBM

B.  

Linear learner

C.  

К-means clustering

D.  

Neural Topic Model (NTM)

Discussion 0
Questions 38

A company has a custom extract, transform, and load (ETL) process that runs on premises. The ETL process is written in the R language and runs for an average of 6 hours. The company wants to migrate the process to run on AWS.

Which solution will meet these requirements?

Options:

A.  

Use an AWS Lambda function created from a container image to run the ETL jobs.

B.  

Use Amazon SageMaker AI processing jobs with a custom Docker image stored in Amazon Elastic Container Registry (Amazon ECR).

C.  

Use Amazon SageMaker AI script mode to build a Docker image. Run the ETL jobs by using SageMaker Notebook Jobs.

D.  

Use AWS Glue to prepare and run the ETL jobs.

Discussion 0
Questions 39

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

B.  

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

C.  

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

D.  

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Discussion 0
Questions 40

A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results.

An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs.

Which solution will meet these requirements?

Options:

A.  

Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality.

B.  

Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality.

C.  

Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

D.  

Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Discussion 0
Questions 41

A company wants to build an anomaly detection ML model. The model will use large-scale tabular data that is stored in an Amazon S3 bucket. The company does not have expertise in Python, Spark, or other languages for ML.

An ML engineer needs to transform and prepare the data for ML model training.

Which solution will meet these requirements?

Options:

A.  

Prepare the data by using Amazon EMR Serverless applications that host Amazon SageMaker Studio notebooks.

B.  

Prepare the data by using the Amazon SageMaker Data Wrangler visual interface in Amazon SageMaker Canvas.

C.  

Run SQL queries from a JupyterLab space in Amazon SageMaker Studio. Process the data further by using pandas DataFrames.

D.  

Prepare the data by using a JupyterLab notebook in Amazon SageMaker Studio.

Discussion 0
Questions 42

A company wants to improve its customer retention ML model. The current model has 85% accuracy and a new model shows 87% accuracy in testing. The company wants to validate the new model’s performance in production.

Which solution will meet these requirements?

Options:

A.  

Deploy the new model for 4 weeks across all production traffic. Monitor performance metrics and validate improvements.

B.  

Run A/B testing on both models for 4 weeks. Route 20% of traffic to the new model. Monitor customer retention rates across both variants.

C.  

Run both models in parallel for 4 weeks. Analyze offline predictions weekly by using historical customer data analysis.

D.  

Implement alternating deployments for 4 weeks between the current model and the new model. Track performance metrics for comparison.

Discussion 0
Questions 43

An ML engineer is using an Amazon SageMaker AI shadow test to evaluate a new model that is hosted on a SageMaker AI endpoint. The shadow test requires significant GPU resources for high performance. The production variant currently runs on a less powerful instance type.

The ML engineer needs to configure the shadow test to use a higher performance instance type for a shadow variant. The solution must not affect the instance type of the production variant.

Which solution will meet these requirements?

Options:

A.  

Modify the existing ProductionVariant configuration in the endpoint to include a ShadowProductionVariants list. Specify the larger instance type for the shadow variant.

B.  

Create a new endpoint configuration with two ProductionVariant definitions. Configure one definition for the existing production variant and one definition for the shadow variant with the larger instance type. Use the UpdateEndpoint action to apply the new configuration.

C.  

Create a separate SageMaker AI endpoint for the shadow variant that uses the larger instance type. Create an AWS Lambda function that routes a portion of the traffic to the shadow endpoint. Assign the Lambda function to the original endpoint.

D.  

Use the CreateEndpointConfig action to define a new configuration. Specify the existing production variant in the configuration and add a separate ShadowProductionVariants list. Specify the larger instance type for the shadow variant. Use the CreateEndpoint action and pass the new configuration to the endpoint.

Discussion 0
Questions 44

A company has multiple models that are hosted on Amazon SageMaker Al. The models need to be re-trained. The requirements for each model are different, so the company needs to choose different deployment strategies to transfer all requests to a new model.

Select the correct strategy from the following list for each requirement. Select each strategy one time. (Select THREE.)

. Canary traffic shifting

. Linear traffic shifting guardrail

. All at once traffic shifting

Options:

Discussion 0
Questions 45

A company is using Amazon SageMaker to create ML models. The company's data scientists need fine-grained control of the ML workflows that they orchestrate. The data scientists also need the ability to visualize SageMaker jobs and workflows as a directed acyclic graph (DAG). The data scientists must keep a running history of model discovery experiments and must establish model governance for auditing and compliance verifications.

Which solution will meet these requirements?

Options:

A.  

Use AWS CodePipeline and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

B.  

Use AWS CodePipeline and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

C.  

Use SageMaker Pipelines and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

D.  

Use SageMaker Pipelines and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

Discussion 0
Questions 46

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

A.  

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B.  

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C.  

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D.  

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Discussion 0
Questions 47

A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.

Which solution will meet these requirements?

Options:

A.  

Use Amazon Made to categorize the sensitive data.

B.  

Prepare the data by using AWS Glue DataBrew.

C.  

Run an AWS Batch job to change the sensitive data to random values.

D.  

Run an Amazon EMR job to change the sensitive data to random values.

Discussion 0
Questions 48

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model's F1 score decreases significantly.

What could be the reason for the reduced F1 score?

Options:

A.  

Concept drift occurred in the underlying customer data that was used for predictions.

B.  

The model was not sufficiently complex to capture all the patterns in the original baseline data.

C.  

The original baseline data had a data quality issue of missing values.

D.  

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Discussion 0
Questions 49

A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks.

What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

Options:

A.  

Adjust the model's parameters and hyperparameters.

B.  

Initiate a manual Model Monitor job that uses the most recent production data.

C.  

Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations.

D.  

Include additional data in the existing training set for the model. Retrain and redeploy the model.

Discussion 0
Questions 50

An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning.

The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain.

Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two.)

Options:

A.  

The ML engineer and the Canvas user must be in separate SageMaker domains.

B.  

The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored.

C.  

The model must be registered in the SageMaker Model Registry.

D.  

The ML engineer must host the model on AWS Marketplace.

E.  

The ML engineer must deploy the model to a SageMaker endpoint.

Discussion 0
Questions 51

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

A.  

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B.  

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C.  

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D.  

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Discussion 0
Questions 52

A company ingests sales transaction data using Amazon Data Firehose into Amazon OpenSearch Service. The Firehose buffer interval is set to 60 seconds.

The company needs sub-second latency for a real-time OpenSearch dashboard.

Which architectural change will meet this requirement?

Options:

A.  

Use zero buffering in the Firehose stream and tune the PutRecordBatch batch size.

B.  

Replace Firehose with AWS DataSync and enhanced fan-out consumers.

C.  

Increase the Firehose buffer interval to 120 seconds.

D.  

Replace Firehose with Amazon SQS.

Discussion 0
Questions 53

A company uses an Amazon SageMaker AI model for real-time inference with auto scaling enabled. During peak usage, new instances launch before existing instances are fully ready, causing inefficiencies and delays.

Which solution will optimize the scaling process without affecting response times?

Options:

A.  

Change to a multi-model endpoint configuration.

B.  

Integrate Amazon API Gateway and AWS Lambda to manage invocations.

C.  

Decrease the scale-in cooldown period and increase the maximum instance count.

D.  

Increase the cooldown period after scale-out activities.

Discussion 0
Questions 54

A company has a binary classification model in production. An ML engineer needs to develop a new version of the model.

The new model version must maximize correct predictions of positive labels and negative labels. The ML engineer must use a metric to recalibrate the model to meet these requirements.

Which metric should the ML engineer use for the model recalibration?

Options:

A.  

Accuracy

B.  

Precision

C.  

Recall

D.  

Specificity

Discussion 0
Questions 55

An ML engineer is preparing a dataset that contains medical records to train an ML model to predict the likelihood of patients developing diseases.

The dataset contains columns for patient ID, age, medical conditions, test results, and a "Disease" target column.

How should the ML engineer configure the data to train the model?

Options:

A.  

Remove the patient ID column.

B.  

Remove the age column.

C.  

Remove the medical conditions and test results columns.

D.  

Remove the "Disease" target column.

Discussion 0
Questions 56

An ML engineer has a custom container that performs k-fold cross-validation and logs an average F1 score during training. The ML engineer wants Amazon SageMaker AI Automatic Model Tuning (AMT) to select hyperparameters that maximize the average F1 score.

How should the ML engineer integrate the custom metric into SageMaker AI AMT?

Options:

A.  

Define the average F1 score in the TrainingInputMode parameter.

B.  

Define a metric definition in the tuning job that uses a regular expression to capture the average F1 score from the training logs.

C.  

Publish the average F1 score as a custom Amazon CloudWatch metric.

D.  

Write the F1 score to a JSON file in Amazon S3 and reference it in ObjectiveMetricName.

Discussion 0
Questions 57

A company wants to improve the sustainability of its ML operations.

Which actions will reduce the energy usage and computational resources that are associated with the company's training jobs? (Choose two.)

Options:

A.  

Use Amazon SageMaker Debugger to stop training jobs when non-converging conditions are detected.

B.  

Use Amazon SageMaker Ground Truth for data labeling.

C.  

Deploy models by using AWS Lambda functions.

D.  

Use AWS Trainium instances for training.

E.  

Use PyTorch or TensorFlow with the distributed training option.

Discussion 0
Questions 58

A company has a conversational AI assistant that sends requests through Amazon Bedrock to an Anthropic Claude large language model (LLM). Users report that when they ask similar questions multiple times, they sometimes receive different answers. An ML engineer needs to improve the responses to be more consistent and less random.

Which solution will meet these requirements?

Options:

A.  

Increase the temperature parameter and the top_k parameter.

B.  

Increase the temperature parameter. Decrease the top_k parameter.

C.  

Decrease the temperature parameter. Increase the top_k parameter.

D.  

Decrease the temperature parameter and the top_k parameter.

Discussion 0
Questions 59

A company has an ML model that generates text descriptions based on images that customers upload to the company's website. The images can be up to 50 MB in total size.

An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Create an Amazon SageMaker batch transform job to process all the images in the S3 bucket.

B.  

Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image.

C.  

Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster that uses Karpenter for auto scaling. Host the model on the EKS cluster. Run a script to make an inference request for each image.

D.  

Create an AWS Batch job that uses an Amazon Elastic Container Service (Amazon ECS) cluster. Specify a list of images to process for each AWS Batch job.

Discussion 0
Questions 60

A company has implemented a data ingestion pipeline for sales transactions from its ecommerce website. The company uses Amazon Data Firehose to ingest data into Amazon OpenSearch Service. The buffer interval of the Firehose stream is set for 60 seconds. An OpenSearch linear model generates real-time sales forecasts based on the data and presents the data in an OpenSearch dashboard.

The company needs to optimize the data ingestion pipeline to support sub-second latency for the real-time dashboard.

Which change to the architecture will meet these requirements?

Options:

A.  

Use zero buffering in the Firehose stream. Tune the batch size that is used in the PutRecordBatch operation.

B.  

Replace the Firehose stream with an AWS DataSync task. Configure the task with enhanced fan-out consumers.

C.  

Increase the buffer interval of the Firehose stream from 60 seconds to 120 seconds.

D.  

Replace the Firehose stream with an Amazon Simple Queue Service (Amazon SQS) queue.

Discussion 0
Questions 61

An ML engineer is using Amazon SageMaker AI to train an ML model. The ML engineer needs to use SageMaker AI automatic model tuning (AMT) features to tune the model hyperparameters over a large parameter space.

The model has 20 categorical hyperparameters and 7 continuous hyperparameters that can be tuned. The ML engineer needs to run the tuning job a maximum of 1,000 times. The ML engineer must ensure that each parameter trial is built based on the performance of the previous trial.

Which solution will meet these requirements?

Options:

A.  

Define the search space as categorical parameters of 1,000 possible combinations. Use grid search.

B.  

Define the search space as continuous parameters. Use random search. Set the maximum number of tuning jobs to 1,000.

C.  

Define the search space as categorical parameters and continuous parameters. Use Bayesian optimization. Set the maximum number of training jobs to 1,000.

D.  

Define the search space as categorical parameters and continuous parameters. Use grid search. Set the maximum number of tuning jobs to 1,000.

Discussion 0
Questions 62

An ML engineer needs to process thousands of existing CSV objects and new CSV objects that are uploaded. The CSV objects are stored in a central Amazon S3 bucket and have the same number of columns. One of the columns is a transaction date. The ML engineer must query the data based on the transaction date.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to create a table based on the transaction date from data in the central S3 bucket. Query the objects from the table.

B.  

Create a new S3 bucket for processed data. Set up S3 replication from the central S3 bucket to the new S3 bucket. Use S3 Object Lambda to query the objects based on transaction date.

C.  

Create a new S3 bucket for processed data. Use AWS Glue for Apache Spark to create a job to query the CSV objects based on transaction date. Configure the job to store the results in the new S3 bucket. Query the objects from the new S3 bucket.

D.  

Create a new S3 bucket for processed data. Use Amazon Data Firehose to transfer the data from the central S3 bucket to the new S3 bucket. Configure Firehose to run an AWS Lambda function to query the data based on transaction date.

Discussion 0