Pre-Summer Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exams65

ExamsBrite Dumps

AWS Certified Machine Learning Engineer - Associate Question and Answers

AWS Certified Machine Learning Engineer - Associate

Last Update May 30, 2026
Total Questions : 241

We are offering FREE MLA-C01 Amazon Web Services exam questions. All you do is to just go and sign up. Give your details, prepare MLA-C01 free exam questions and then go for complete pool of AWS Certified Machine Learning Engineer - Associate test questions that will help you more.

MLA-C01 pdf

MLA-C01 PDF

$36.75  $104.99
MLA-C01 Engine

MLA-C01 Testing Engine

$43.75  $124.99
MLA-C01 PDF + Engine

MLA-C01 PDF + Testing Engine

$57.75  $164.99
Questions 1

An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning.

The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain.

Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two.)

Options:

A.  

The ML engineer and the Canvas user must be in separate SageMaker domains.

B.  

The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored.

C.  

The model must be registered in the SageMaker Model Registry.

D.  

The ML engineer must host the model on AWS Marketplace.

E.  

The ML engineer must deploy the model to a SageMaker endpoint.

Discussion 0
Questions 2

A company uses an Amazon EMR cluster to run a data ingestion process for an ML model. An ML engineer notices that the processing time is increasing.

Which solution will reduce the processing time MOST cost-effectively?

Options:

A.  

Use Spot Instances to increase the number of primary nodes.

B.  

Use Spot Instances to increase the number of core nodes.

C.  

Use Spot Instances to increase the number of task nodes.

D.  

Use On-Demand Instances to increase the number of core nodes.

Discussion 0
Questions 3

An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:

• Feature splitting

• Logarithmic transformation

• One-hot encoding

• Standardized distribution

Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

Options:

Discussion 0
Questions 4

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

A.  

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.  

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

C.  

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

D.  

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Discussion 0
Questions 5

An ML engineer wants to deploy an Amazon SageMaker AI model for inference. The payload sizes are less than 3 MB. Processing time does not exceed 45 seconds. The traffic patterns will be irregular or unpredictable.

Which inference option will meet these requirements MOST cost-effectively?

Options:

A.  

Asynchronous inference

B.  

Real-time inference

C.  

Serverless inference

D.  

Batch transform

Discussion 0
Questions 6

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

Options:

A.  

Perform ordinal encoding to represent categories of the feature.

B.  

Perform similarity encoding to represent categories of the feature.

C.  

Perform one-hot encoding to represent categories of the feature.

D.  

Perform target encoding to represent categories of the feature.

Discussion 0
Questions 7

A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model ' s performance by using live data and without affecting production end users.

Which solution will meet these requirements?

Options:

A.  

Set up SageMaker Debugger and create a custom rule.

B.  

Set up blue/green deployments with all-at-once traffic shifting.

C.  

Set up blue/green deployments with canary traffic shifting.

D.  

Set up shadow testing with a shadow variant of the new model.

Discussion 0
Questions 8

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

Options:

A.  

Use a multi-model serverless endpoint. Enable caching.

B.  

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

C.  

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

D.  

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Discussion 0
Questions 9

An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.

Which solution will improve the model’s performance?

Options:

A.  

Optimize for accuracy. Use image augmentation on the less common images.

B.  

Optimize for F1 score. Use image augmentation on the less common images.

C.  

Optimize for accuracy. Use SMOTE to generate synthetic images.

D.  

Optimize for F1 score. Use SMOTE to generate synthetic images.

Discussion 0
Questions 10

A company has a custom extract, transform, and load (ETL) process that runs on premises. The ETL process is written in the R language and runs for an average of 6 hours. The company wants to migrate the process to run on AWS.

Which solution will meet these requirements?

Options:

A.  

Use an AWS Lambda function created from a container image to run the ETL jobs.

B.  

Use Amazon SageMaker AI processing jobs with a custom Docker image stored in Amazon Elastic Container Registry (Amazon ECR).

C.  

Use Amazon SageMaker AI script mode to build a Docker image. Run the ETL jobs by using SageMaker Notebook Jobs.

D.  

Use AWS Glue to prepare and run the ETL jobs.

Discussion 0
Questions 11

A company wants to deploy an Amazon SageMaker AI model that can queue requests. The model needs to handle payloads of up to 1 GB that take up to 1 hour to process. The model must return an inference for each request. The model also must scale down when no requests are available to process.

Which inference option will meet these requirements?

Options:

A.  

Asynchronous inference

B.  

Batch transform

C.  

Serverless inference

D.  

Real-time inference

Discussion 0
Questions 12

An ML engineer is building an ML model in Amazon SageMaker AI. The ML engineer needs to load historical data directly from Amazon S3, Amazon Athena, and Snowflake into SageMaker AI.

Which solution will meet this requirement?

Options:

A.  

Use AWS Glue DataBrew to import the data into SageMaker AI.

B.  

Build a pipeline in SageMaker Pipelines to process the data. Use AWS DataSync to load the processed data into SageMaker AI.

C.  

Create a feature store in SageMaker Feature Store. Use an Apache Spark connector to Feature Store to access the data.

D.  

Use SageMaker Data Wrangler to query and import the data.

Discussion 0
Questions 13

An ML engineer is deploying a generative AI model-based customer support agent that uses Amazon SageMaker AI for inference. The customer support agent must respond to customer questions about topics such as shipping policies, refund processes, and account management. The generative AI model generates one token at a time.

Customers report dissatisfaction with how long the customer support agent takes to generate lengthy responses to questions. The ML engineer must apply an inference optimization technique to improve the performance of the customer support agent.

Which solution will meet this requirement?

Options:

A.  

Compilation

B.  

Speculative decoding

C.  

Quantization

D.  

Fast model loading

Discussion 0
Questions 14

An ML engineer is collecting data to train a classification ML model by using Amazon SageMaker AI. The target column can have two possible values: Class A or Class B. The ML engineer wants to ensure that the number of samples for both Class A and Class B are balanced, without losing any existing training data. The ML engineer must test the balance of the training data.

Which solution will meet this requirement?

Options:

A.  

Use SageMaker Clarify to check for class imbalance (CI). If the value is equal to 0, then use random undersampling in SageMaker Data Wrangler to balance the classes.

B.  

Use SageMaker Clarify to check for class imbalance (CI). If the value is greater than 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Data Wrangler to balance the classes.

C.  

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is greater than 0, then use random undersampling in SageMaker Studio to balance the classes.

D.  

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is equal to 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Studio to balance the classes.

Discussion 0
Questions 15

A gaming company needs to deploy a natural language processing (NLP) model to moderate a chat forum in a game. The workload experiences heavy usage during evenings and weekends but minimal activity during other hours.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.  

Use an Amazon SageMaker AI batch transform job with fixed capacity.

B.  

Use Amazon SageMaker Serverless Inference.

C.  

Use a single Amazon EC2 GPU instance with reserved capacity.

D.  

Use Amazon SageMaker Asynchronous Inference.

Discussion 0
Questions 16

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker AI compute costs reach a specific threshold.

Which solution will meet these requirements?

Options:

A.  

Add resource tagging by editing the SageMaker AI user profile in the SageMaker AI domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

B.  

Add resource tagging by editing the SageMaker AI user profile in the SageMaker AI domain. Configure AWS Budgets to send an alert when the threshold is reached.

C.  

Add resource tagging by editing each user ' s IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

D.  

Add resource tagging by editing each user ' s IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Discussion 0
Questions 17

A company is developing an ML model by using Amazon SageMaker AI. The company must monitor bias in the model and display the results on a dashboard. An ML engineer creates a bias monitoring job.

How should the ML engineer capture bias metrics to display on the dashboard?

Options:

A.  

Capture AWS CloudTrail metrics from SageMaker Clarify.

B.  

Capture Amazon CloudWatch metrics from SageMaker Clarify.

C.  

Capture SageMaker Model Monitor metrics from Amazon EventBridge.

D.  

Capture SageMaker Model Monitor metrics from Amazon SNS.

Discussion 0
Questions 18

An ML engineer develops a neural network model to predict whether customers will continue to subscribe to a service. The model performs well on training data. However, the accuracy of the model decreases significantly on evaluation data.

The ML engineer must resolve the model performance issue.

Which solution will meet this requirement?

Options:

A.  

Penalize large weights by using L1 or L2 regularization.

B.  

Remove dropout layers from the neural network.

C.  

Train the model for longer by increasing the number of epochs.

D.  

Capture complex patterns by increasing the number of layers.

Discussion 0
Questions 19

A bank needs to use Amazon SageMaker AI to create an ML model to determine which customers qualify for a new product. The bank must use algorithms that SageMaker AI directly supports. The model must be explainable to the bank ' s regulators.

Which modeling approach will meet these requirements?

Options:

A.  

Train the model by using the Object2Vec algorithm.

B.  

Train the model by using the linear learner algorithm.

C.  

Train a neural network.

D.  

Train the model by using the k-means algorithm.

Discussion 0
Questions 20

A company has a binary classification model in production. An ML engineer needs to develop a new version of the model.

The new model version must maximize correct predictions of positive labels and negative labels. The ML engineer must use a metric to recalibrate the model to meet these requirements.

Which metric should the ML engineer use for the model recalibration?

Options:

A.  

Accuracy

B.  

Precision

C.  

Recall

D.  

Specificity

Discussion 0
Questions 21

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

B.  

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

C.  

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

D.  

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Discussion 0
Questions 22

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold.

Which solution will meet these requirements?

Options:

A.  

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

B.  

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached.

C.  

Add resource tagging by editing each user ' s IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

D.  

Add resource tagging by editing each user ' s IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Discussion 0
Questions 23

A music streaming company constantly streams song ratings from an application to an Amazon S3 bucket. The company wants to use the ratings as an input for training and inference of an Amazon SageMaker AI model.

The company has an AWS Glue Data Catalog that is configured with the S3 bucket as the source. An ML engineer needs to implement a solution to create a repository for this data. The solution must ensure that the data stays synchronized during batch training and real-time inference.

Which solution will meet these requirements?

Options:

A.  

Ingest data into SageMaker Feature Store from the S3 bucket. Apply tags and indexes.

B.  

Use Amazon Athena. Create tables by using CREATE TABLE AS SELECT (CTAS) queries to group data.

C.  

Use AWS Lake Formation. Apply tag-based control on the data.

D.  

Use the Generate Data Insights function in SageMaker Data Wrangler.

Discussion 0
Questions 24

An ML engineer at a credit card company built and deployed an ML model by using Amazon SageMaker AI. The model was trained on transaction data that contained very few fraudulent transactions. After deployment, the model is underperforming.

What should the ML engineer do to improve the model’s performance?

Options:

A.  

Retrain the model with a different SageMaker built-in algorithm.

B.  

Use random undersampling to reduce the majority class and retrain the model.

C.  

Use Synthetic Minority Oversampling Technique (SMOTE) to generate synthetic minority samples and retrain the model.

D.  

Use random oversampling to duplicate minority samples and retrain the model.

Discussion 0
Questions 25

An ML engineer has a custom container that performs k-fold cross-validation and logs an average F1 score during training. The ML engineer wants Amazon SageMaker AI Automatic Model Tuning (AMT) to select hyperparameters that maximize the average F1 score.

How should the ML engineer integrate the custom metric into SageMaker AI AMT?

Options:

A.  

Define the average F1 score in the TrainingInputMode parameter.

B.  

Define a metric definition in the tuning job that uses a regular expression to capture the average F1 score from the training logs.

C.  

Publish the average F1 score as a custom Amazon CloudWatch metric.

D.  

Write the F1 score to a JSON file in Amazon S3 and reference it in ObjectiveMetricName.

Discussion 0
Questions 26

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Options:

A.  

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

B.  

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

C.  

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.  

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Discussion 0
Questions 27

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model ' s F1 score decreases significantly.

What could be the reason for the reduced F1 score?

Options:

A.  

Concept drift occurred in the underlying customer data that was used for predictions.

B.  

The model was not sufficiently complex to capture all the patterns in the original baseline data.

C.  

The original baseline data had a data quality issue of missing values.

D.  

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Discussion 0
Questions 28

An ML engineer is training an XGBoost regression model in Amazon SageMaker AI. The ML engineer conducts several rounds of hyperparameter tuning with random grid search. After these rounds of tuning, the error rate on the test hold-out dataset is much larger than the error rate on the training dataset.

The ML engineer needs to make changes before running the hyperparameter grid search again.

Which changes will improve the model ' s performance? (Select TWO.)

Options:

A.  

Increase the model complexity by increasing the number of features in the dataset.

B.  

Decrease the model complexity by reducing the number of features in the dataset.

C.  

Decrease the model complexity by reducing the number of samples in the dataset.

D.  

Increase the value of the L2 regularization parameter.

E.  

Decrease the value of the L2 regularization parameter.

Discussion 0
Questions 29

A company uses Amazon SageMakerAI to support ML workflows such as model training and deployment.

Select the correct registry from the following list to meet the requirements for each use case with the LEAST operational overhead. Each registry should be selected one or more times. (Select FOUR.)

• Amazon Elastic Container Registry (Amazon ECR)

• SageMaker Model Registry

Options:

Discussion 0
Questions 30

A company is using ML to predict the presence of a specific weed in a farmer ' s field. The company is using the Amazon SageMaker linear learner built-in algorithm with a value of multiclass_dassifier for the predictorjype hyperparameter.

What should the company do to MINIMIZE false positives?

Options:

A.  

Set the value of the weight decay hyperparameter to zero.

B.  

Increase the number of training epochs.

C.  

Increase the value of the target_precision hyperparameter.

D.  

Change the value of the predictorjype hyperparameter to regressor.

Discussion 0
Questions 31

An ML engineer needs to organize a large set of text documents into topics. The ML engineer will not know what the topics are in advance. The ML engineer wants to use built-in algorithms or pre-trained models available through Amazon SageMaker AI to process the documents.

Which solution will meet these requirements?

Options:

A.  

Use the BlazingText algorithm to identify the relevant text and to create a set of topics based on the documents.

B.  

Use the Sequence-to-Sequence algorithm to summarize the text and to create a set of topics based on the documents.

C.  

Use the Object2Vec algorithm to create embeddings and to create a set of topics based on the embeddings.

D.  

Use the Latent Dirichlet Allocation (LDA) algorithm to process the documents and to create a set of topics based on the documents.

Discussion 0
Questions 32

An ML engineering team is spread across multiple locations. When the lead ML engineer opens an Amazon SageMaker AI notebook, the ML engineer does not see the latest merged notebook made by other team members from a Git repository.

The lead ML engineer must see the latest SageMaker AI notebook updates.

Which solution will meet this requirement?

Options:

A.  

Run the !git pull origin master command.

B.  

Run the !git commit command.

C.  

Run the !git push origin master command.

D.  

Run the !git branch command.

Discussion 0
Questions 33

An ML engineer needs to process thousands of existing CSV objects and new CSV objects that are uploaded. The CSV objects are stored in a central Amazon S3 bucket and have the same number of columns. One of the columns is a transaction date. The ML engineer must query the data based on the transaction date.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to create a table based on the transaction date from data in the central S3 bucket. Query the objects from the table.

B.  

Create a new S3 bucket for processed data. Set up S3 replication from the central S3 bucket to the new S3 bucket. Use S3 Object Lambda to query the objects based on transaction date.

C.  

Create a new S3 bucket for processed data. Use AWS Glue for Apache Spark to create a job to query the CSV objects based on transaction date. Configure the job to store the results in the new S3 bucket. Query the objects from the new S3 bucket.

D.  

Create a new S3 bucket for processed data. Use Amazon Data Firehose to transfer the data from the central S3 bucket to the new S3 bucket. Configure Firehose to run an AWS Lambda function to query the data based on transaction date.

Discussion 0
Questions 34

An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.

Which solution will meet these requirements?

Options:

A.  

Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

B.  

Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

C.  

Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

D.  

Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Discussion 0
Questions 35

A company uses an ML model to recommend videos to users. The model is deployed on Amazon SageMaker AI. The model performed well initially after deployment, but the model ' s performance has degraded over time.

Which solution can the company use to identify model drift in the future?

Options:

A.  

Create a monitoring job in SageMaker Model Monitor. Then create a baseline from the training dataset.

B.  

Create a baseline from the training dataset. Then create a monitoring job in SageMaker Model Monitor.

C.  

Create a baseline by using a built-in rule in SageMaker Clarify. Monitor the drift in Amazon CloudWatch.

D.  

Retrain the model on new data. Compare the retrained model ' s performance to the original model ' s performance.

Discussion 0
Questions 36

A company has deployed a model to predict the churn rate for its games by using Amazon SageMaker Studio. After the model is deployed, the company must monitor the model performance for data drift and inspect the report. Select and order the correct steps from the following list to model monitor actions. Select each step one time. (Select and order THREE.) .

Check the analysis results on the SageMaker Studio console. .

Create a Shapley Additive Explanations (SHAP) baseline for the model by using Amazon SageMaker Clarify.

Schedule an hourly model explainability monitor.

Options:

Discussion 0
Questions 37

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

A.  

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B.  

Use a custom Amazon SageMaker AI notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C.  

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D.  

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Discussion 0
Questions 38

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Ingest real-time data into Amazon Kinesis Data Streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

B.  

Ingest real-time data into Amazon Kinesis Data Streams. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

C.  

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

D.  

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Discussion 0
Questions 39

A company has trained an ML model in Amazon SageMaker. The company needs to host the model to provide inferences in a production environment.

The model must be highly available and must respond with minimum latency. The size of each request will be between 1 KB and 3 MB. The model will receive unpredictable bursts of requests during the day. The inferences must adapt proportionally to the changes in demand.

How should the company deploy the model into production to meet these requirements?

Options:

A.  

Create a SageMaker real-time inference endpoint. Configure auto scaling. Configure the endpoint to present the existing model.

B.  

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster. Use ECS scheduled scaling that is based on the CPU of the ECS cluster.

C.  

Install SageMaker Operator on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Deploy the model in Amazon EKS. Set horizontal pod auto scaling to scale replicas based on the memory metric.

D.  

Use Spot Instances with a Spot Fleet behind an Application Load Balancer (ALB) for inferences. Use the ALBRequestCountPerTarget metric as the metric for auto scaling.

Discussion 0
Questions 40

A company has a large, unstructured dataset. The dataset includes many duplicate records across several key attributes.

Which solution on AWS will detect duplicates in the dataset with the LEAST code development?

Options:

A.  

Use Amazon Mechanical Turk jobs to detect duplicates.

B.  

Use Amazon QuickSight ML Insights to build a custom deduplication model.

C.  

Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.

D.  

Use the AWS Glue FindMatches transform to detect duplicates.

Discussion 0
Questions 41

A company wants to use large language models (LLMs) that are supported by Amazon Bedrock to develop a chat interface for the company ' s internal technical documentation. The company stores the documentation as dozens of text files that are several megabytes in total size. The company updates the text files often.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.  

Create a new LLM on Amazon Bedrock. Train the new LLM on the original dataset and the company documentation. Make the new model available in Bedrock for calls from the chat interface.

B.  

Integrate the company documentation with Amazon Bedrock guardrails. Invoke the guardrails for all Amazon Bedrock calls from the chat interface.

C.  

Use all the text files to fine tune a model in Amazon Bedrock. Use the fine-tuned model to process user prompts.

D.  

Upload all the text files to an Amazon Bedrock knowledge base. Use the knowledge base to provide context when the chat interface makes calls to Amazon Bedrock.

Discussion 0
Questions 42

A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS.

Which solution will meet these requirements with the LEAST effort?

Options:

A.  

Use SageMaker built-in algorithms to train the proprietary datasets.

B.  

Use SageMaker script mode and premade images for ML frameworks.

C.  

Build a container on AWS that includes custom packages and a choice of ML frameworks.

D.  

Purchase similar production models through AWS Marketplace.

Discussion 0
Questions 43

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records per second.

The company needs a scalable AWS solution to identify anomalous data points with the LEAST operational overhead.

Which solution will meet these requirements?

Options:

A.  

Ingest data into Amazon Kinesis Data Streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to detect anomalies.

B.  

Ingest data into Kinesis Data Streams. Deploy a SageMaker AI endpoint and use AWS Lambda to detect anomalies.

C.  

Ingest data into Apache Kafka on Amazon EC2 and use SageMaker AI for detection.

D.  

Send data to Amazon SQS and use AWS Glue ETL jobs for batch anomaly detection.

Discussion 0
Questions 44

A company is creating an ML model to identify defects in a product. The company has gathered a dataset and has stored the dataset in TIFF format in Amazon S3. The dataset contains 200 images in which the most common defects are visible. The dataset also contains 1,800 images in which there is no defect visible.

An ML engineer trains the model and notices poor performance in some classes. The ML engineer identifies a class imbalance problem in the dataset.

What should the ML engineer do to solve this problem?

Options:

A.  

Use a few hundred images and Amazon Rekognition Custom Labels to train a new model.

B.  

Undersample the 200 images in which the most common defects are visible.

C.  

Oversample the 200 images in which the most common defects are visible.

D.  

Use all 2,000 images and Amazon Rekognition Custom Labels to train a new model.

Discussion 0
Questions 45

A company regularly receives new training data from a vendor of an ML model. The vendor delivers cleaned and prepared data to the company’s Amazon S3 bucket every 3–4 days.

The company has an Amazon SageMaker AI pipeline to retrain the model. An ML engineer needs to run the pipeline automatically when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.  

Create an S3 lifecycle rule to transfer the data to the SageMaker AI training instance and initiate training.

B.  

Create an AWS Lambda function that scans the S3 bucket and initiates the pipeline when new data is uploaded.

C.  

Create an Amazon EventBridge rule that matches S3 upload events and configures the SageMaker pipeline as the target.

D.  

Use Amazon Managed Workflows for Apache Airflow (MWAA) to orchestrate the pipeline when new data is uploaded.

Discussion 0
Questions 46

An ML engineer is training a simple neural network model. The ML engineer tracks the performance of the model over time on a validation dataset. The model ' s performance improves substantially at first and then degrades after a specific number of epochs.

Which solutions will mitigate this problem? (Choose two.)

Options:

A.  

Enable early stopping on the model.

B.  

Increase dropout in the layers.

C.  

Increase the number of layers.

D.  

Increase the number of neurons.

E.  

Investigate and reduce the sources of model bias.

Discussion 0
Questions 47

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model ' s performance?

Options:

A.  

Accuracy

B.  

Area Under the ROC Curve (AUC)

C.  

F1 score

D.  

Mean absolute error (MAE)

Discussion 0
Questions 48

An ML engineer wants to run a training job on Amazon SageMaker AI by using multiple GPUs. The training dataset is stored in Apache Parquet format.

The Parquet files are too large to fit into the memory of the SageMaker AI training instances.

Which solution will fix the memory problem?

Options:

A.  

Attach an Amazon EBS Provisioned IOPS SSD volume and store the files on the EBS volume.

B.  

Repartition the Parquet files by using Apache Spark on Amazon EMR and use the repartitioned files for training.

C.  

Change to memory-optimized instance types with sufficient memory.

D.  

Use SageMaker distributed data parallelism (SMDDP) to split memory usage.

Discussion 0
Questions 49

A company is using Amazon SageMaker and millions of files to train an ML model. Each file is several megabytes in size. The files are stored in an Amazon S3 bucket. The company needs to improve training performance.

Which solution will meet these requirements in the LEAST amount of time?

Options:

A.  

Transfer the data to a new S3 bucket that provides S3 Express One Zone storage. Adjust the training job to use the new S3 bucket.

B.  

Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system.

C.  

Create an Amazon Elastic File System (Amazon EFS) file system. Transfer the existing data to the file system. Adjust the training job to read from the file system.

D.  

Create an Amazon ElastiCache (Redis OSS) cluster. Link the Redis OSS cluster to the existing S3 bucket. Stream the data from the Redis OSS cluster directly to the training job.

Discussion 0
Questions 50

An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day.

The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model ' s capacity to respond to requests during times of peak usage.

Which solution will meet these requirements?

Options:

A.  

Create AWS Lambda functions that have fixed concurrency to host the model. Configure the Lambda functions to automatically scale based on the number of requests to the model.

B.  

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Set a static number of tasks to handle requests during times of peak usage.

C.  

Deploy the model to an Amazon SageMaker endpoint. Deploy multiple copies of the model to the endpoint. Create an Application Load Balancer to route traffic between the different copies of the model at the endpoint.

D.  

Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically.

Discussion 0
Questions 51

A company wants to use Amazon SageMaker AI to host an ML model that runs on CPU for real-time predictions. The model has intermittent traffic during business hours and periods of no traffic after business hours.

Which hosting option will serve inference requests in the MOST cost-effective manner?

Options:

A.  

Deploy the model to a real-time endpoint with scheduled auto scaling.

B.  

Deploy the model to a SageMaker AI Serverless Inference endpoint with provisioned concurrency during business hours.

C.  

Deploy the model to an asynchronous inference endpoint with auto scaling to zero.

D.  

Deploy the model to a real-time endpoint and activate it only during business hours using AWS Lambda.

Discussion 0
Questions 52

An ML engineer is setting up a CI/CD pipeline for an ML workflow in Amazon SageMaker AI. The pipeline must automatically retrain, test, and deploy a model whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer also needs to track model versions for auditing.

Which solution will meet these requirements?

Options:

A.  

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and track model versions.

B.  

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

C.  

Use AWS Lambda and Amazon EventBridge to retrain and deploy the model and track versions via logs.

D.  

Manually retrain and deploy the model using SageMaker notebook instances and track versions with AWS CloudTrail.

Discussion 0
Questions 53

A company has an ML model that is deployed to an Amazon SageMaker AI endpoint for real-time inference. The company needs to deploy a new model. The company must compare the new model’s performance to the currently deployed model ' s performance before shifting all traffic to the new model.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.  

Deploy the new model to a separate endpoint. Manually split traffic between the two endpoints.

B.  

Deploy the new model to a separate endpoint. Use Amazon CloudFront to distribute traffic between the two endpoints.

C.  

Deploy the new model as a shadow variant on the same endpoint as the current model. Route a portion of live traffic to the shadow model for evaluation.

D.  

Use AWS Lambda functions with custom logic to route traffic between the current model and the new model.

Discussion 0
Questions 54

A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.

Which solution will meet these requirements?

Options:

A.  

Use Amazon Made to categorize the sensitive data.

B.  

Prepare the data by using AWS Glue DataBrew.

C.  

Run an AWS Batch job to change the sensitive data to random values.

D.  

Run an Amazon EMR job to change the sensitive data to random values.

Discussion 0
Questions 55

An ML engineer decides to use Amazon SageMaker AI automated model tuning (AMT) for hyperparameter optimization (HPO). The ML engineer requires a tuning strategy that uses regression to slowly and sequentially select the next set of hyperparameters based on previous runs. The strategy must work across small hyperparameter ranges.

Which solution will meet these requirements?

Options:

A.  

Grid search

B.  

Random search

C.  

Bayesian optimization

D.  

Hyperband

Discussion 0
Questions 56

An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar

dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems.

The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions.

B.  

Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

C.  

Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

D.  

Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions.

Discussion 0
Questions 57

An ML engineer is using Amazon SageMaker Canvas to build a custom ML model from an imported dataset. The model must make continuous numeric predictions based on 10 years of data.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

A.  

Accuracy

B.  

InferenceLatency

C.  

Area Under the ROC Curve (AUC)

D.  

Root Mean Square Error (RMSE)

Discussion 0
Questions 58

A streaming media company uses a churn risk model to assess the churn risk of its premium tier customers. Each month, the company runs an aggregation job on individual customers’ streaming data and uploads the user engagement features to an Amazon S3 bucket. The company manually re-trains the churn risk model with the user engagement data.

The current process requires manual intervention and is time-consuming. The company needs a solution that automatically re-trains the churn prediction model with the most recent data.

Which solution will meet these requirements with the SHORTEST delay?

Options:

A.  

Set up an Amazon EventBridge rule to run an Amazon Elastic Container Service (Amazon ECS) task hourly for model re-training. Configure the ECS task to use the most recent data from the S3 bucket.

B.  

Configure the S3 bucket to invoke an AWS Lambda function that re-trains the model.

C.  

Create a pipeline in Amazon SageMaker Pipelines for re-training. Configure an Amazon EventBridge rule to monitor S3 PutObject creation events and invoke the pipeline.

D.  

Create a pipeline in Amazon SageMaker Pipelines for re-training. Configure a pipeline schedule to re-train the model.

Discussion 0
Questions 59

A company is gathering audio, video, and text data in various languages. The company needs to use a large language model (LLM) to summarize the gathered data that is in Spanish.

Which solution will meet these requirements in the LEAST amount of time?

Options:

A.  

Train and deploy a model in Amazon SageMaker to convert the data into English text. Train and deploy an LLM in SageMaker to summarize the text.

B.  

Use Amazon Transcribe and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Jurassic model to summarize the text.

C.  

Use Amazon Rekognition and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Anthropic Claude model to summarize the text.

D.  

Use Amazon Comprehend and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Stable Diffusion model to summarize the text.

Discussion 0
Questions 60

A company is developing an application that reads animal descriptions from user prompts and generates images based on the information in the prompts. The application reads a message from an Amazon Simple Queue Service (Amazon SQS) queue. Then the application uses Amazon Titan Image Generator on Amazon Bedrock to generate an image based on the information in the message. Finally, the application removes the message from SQS queue.

Which IAM permissions should the company assign to the application ' s IAM role? (Select TWO.)

Options:

A.  

Allow the bedrock:InvokeModel action for the Amazon Titan Image Generator resource.

B.  

Allow the bedrock:Get* action for the Amazon Titan Image Generator resource.

C.  

Allow the sqs:ReceiveMessage action and the sqs:DeleteMessage action for the SQS queue resource.

D.  

Allow the sqs:GetQueueAttributes action and the sqs:DeleteMessage action for the SQS queue resource.

E.  

Allow the sagemaker:PutRecord* action for the Amazon Titan Image Generator resource.

Discussion 0
Questions 61

A company is exploring generative AI and wants to add a new product feature. An ML engineer is making API calls from existing Amazon EC2 instances to Amazon Bedrock.

The EC2 instances are in a private subnet and must remain private during the implementation. The EC2 instances have a security group that allows access to all IP addresses in the private subnet.

What should the ML engineer do to establish a connection between the EC2 instances and Amazon Bedrock?

Options:

A.  

Modify the security group to allow inbound and outbound traffic to and from Amazon Bedrock.

B.  

Use AWS PrivateLink to access Amazon Bedrock through an interface VPC endpoint.

C.  

Configure Amazon Bedrock to use the private subnet where the EC2 instances are deployed.

D.  

Use AWS Direct Connect to link the VPC to Amazon Bedrock.

Discussion 0
Questions 62

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model ' s algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Which AWS service or feature can aggregate the data from the various data sources?

Options:

A.  

Amazon EMR Spark jobs

B.  

Amazon Kinesis Data Streams

C.  

Amazon DynamoDB

D.  

AWS Lake Formation

Discussion 0
Questions 63

An ML company wants to monitor and analyze the API calls that its AWS resources make. The company has created an AWS CloudTrail log file that logs to an Amazon S3 bucket. The company has also created an organization in AWS Organizations to manage permissions across accounts.

The company needs to enable log file validation to ensure the integrity of its log files.

Which solution will meet these requirements?

Options:

A.  

Enable CloudTrail log file integrity validation.

B.  

Create a multi-Region trail in CloudTrail.

C.  

Create a trail in CloudTrail for the organization.

D.  

Enable Amazon CloudWatch Logs delivery.

Discussion 0
Questions 64

A company uses an NFS-based data store to store data for ML training. Linux-based systems access the data store.

The company needs a hybrid system to make the shared data store accessible to on-premises servers and Amazon SageMaker AI notebooks that will consume the data. File locking is required for the data producers.

Which AWS storage solution will meet these requirements?

Options:

A.  

Use an Amazon S3 bucket to store the data. Use Mountpoint for Amazon S3 to mount the S3 bucket to the on-premises servers and the SageMaker AI notebooks.

B.  

Use an Amazon Elastic File System (Amazon EFS) file system to store the data. Mount the file system to the on-premises servers and the SageMaker AI notebooks.

C.  

Use an Amazon FSx for Lustre file system to store the data. Mount the file system to the on-premises servers and the SageMaker AI notebooks.

D.  

Use an Amazon Elastic Block Store (Amazon EBS) volume to store the data. Mount the volume to the on-premises servers and the SageMaker AI notebooks.

Discussion 0
Questions 65

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model ' s performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model ' s final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

Options:

Discussion 0
Questions 66

A company is building a near real-time data analytics application to detect anomalies and failures for industrial equipment. The company has thousands of IoT sensors that send data every 60 seconds. When new versions of the application are released, the company wants to ensure that application code bugs do not prevent the application from running.

Which solution will meet these requirements?

Options:

A.  

Use Amazon Managed Service for Apache Flink with the system rollback capability enabled to build the data analytics application.

B.  

Use Amazon Managed Service for Apache Flink with manual rollback when an error occurs to build the data analytics application.

C.  

Use Amazon Data Firehose to deliver real-time streaming data programmatically for the data analytics application. Pause the stream when a new version of the application is released and resume the stream after the application is deployed.

D.  

Use Amazon Data Firehose to deliver data to Amazon EC2 instances across two Availability Zones for the data analytics application.

Discussion 0
Questions 67

An ML engineer has trained a neural network by using stochastic gradient descent (SGD). The neural network performs poorly on the test set. The values for training loss and validation loss remain high and show an oscillating pattern. The values decrease for a few epochs and then increase for a few epochs before repeating the same cycle.

What should the ML engineer do to improve the training process?

Options:

A.  

Introduce early stopping.

B.  

Increase the size of the test set.

C.  

Increase the learning rate.

D.  

Decrease the learning rate.

Discussion 0
Questions 68

An ML engineer wants to deploy a workflow that processes streaming IoT sensor data and periodically retrains ML models. The most recent model versions must be deployed to production.

Which service will meet these requirements?

Options:

A.  

Amazon SageMaker Pipelines

B.  

Amazon Managed Workflows for Apache Airflow (MWAA)

C.  

AWS Lambda

D.  

Apache Spark

Discussion 0
Questions 69

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

Options:

A.  

AWS::SageMaker::Model

B.  

AWS::SageMaker::Endpoint

C.  

AWS::SageMaker::NotebookInstance

D.  

AWS::SageMaker::Pipeline

Discussion 0
Questions 70

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.

The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.

Which solution will provide the HIGHEST performance for data retrieval?

Options:

A.  

Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

B.  

Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

C.  

Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

D.  

Put each day ' s time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Discussion 0
Questions 71

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to run an on-demand workflow to monitor bias drift for models that are deployed to real-time endpoints from the application.

Which action will meet this requirement?

Options:

A.  

Configure the application to invoke an AWS Lambda function that runs a SageMaker Clarify job.

B.  

Invoke an AWS Lambda function to pull the sagemaker-model-monitor-analyzer built-in SageMaker image.

C.  

Use AWS Glue Data Quality to monitor bias.

D.  

Use SageMaker notebooks to compare the bias.

Discussion 0
Questions 72

A company uses Amazon Athena to query a dataset in Amazon S3. The dataset has a target variable that the company wants to predict.

The company needs to use the dataset in a solution to determine if a model can predict the target variable.

Which solution will provide this information with the LEAST development effort?

Options:

A.  

Create a new model by using Amazon SageMaker Autopilot. Report the model ' s achieved performance.

B.  

Implement custom scripts to perform data pre-processing, multiple linear regression, and performance evaluation. Run the scripts on Amazon EC2 instances.

C.  

Configure Amazon Macie to analyze the dataset and to create a model. Report the model ' s achieved performance.

D.  

Select a model from Amazon Bedrock. Tune the model with the data. Report the model ' s achieved performance.

Discussion 0