Summer Special Discount 60% Offer - Ends in 0d 00h 00m 00s - Coupon code: brite60

ExamsBrite Dumps

AWS Certified Data Engineer - Associate (DEA-C01) Question and Answers

AWS Certified Data Engineer - Associate (DEA-C01)

Last Update Sep 23, 2025
Total Questions : 190

We are offering FREE Data-Engineer-Associate Amazon Web Services exam questions. All you do is to just go and sign up. Give your details, prepare Data-Engineer-Associate free exam questions and then go for complete pool of AWS Certified Data Engineer - Associate (DEA-C01) test questions that will help you more.

Data-Engineer-Associate pdf

Data-Engineer-Associate PDF

$42  $104.99
Data-Engineer-Associate Engine

Data-Engineer-Associate Testing Engine

$50  $124.99
Data-Engineer-Associate PDF + Engine

Data-Engineer-Associate PDF + Testing Engine

$66  $164.99
Questions 1

A car sales company maintains data about cars that are listed for sale in an area. The company receives data about new car listings from vendors who upload the data daily as compressed files into Amazon S3. The compressed files are up to 5 KB in size. The company wants to see the most up-to-date listings as soon as the data is uploaded to Amazon S3.

A data engineer must automate and orchestrate the data processing workflow of the listings to feed a dashboard. The data engineer must also provide the ability to perform one-time queries and analytical reporting. The query solution must be scalable.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.  

Use an Amazon EMR cluster to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Apache Hive for one-time queries and analytical reporting. Use Amazon OpenSearch Service to bulk ingest the data into compute optimized instances. Use OpenSearch Dashboards in OpenSearch Service for the dashboard.

B.  

Use a provisioned Amazon EMR cluster to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.

C.  

Use AWS Glue to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Amazon Redshift Spectrum for one-time queries and analyticalreporting. Use OpenSearch Dashboards in Amazon OpenSearch Service for the dashboard.

D.  

Use AWS Glue to process incoming data. Use AWS Lambda and S3 Event Notifications to orchestrate workflows. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.

Discussion 0
Questions 2

A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.

The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.

Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Select TWO.)

Options:

A.  

Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.

B.  

Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.

C.  

Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.

D.  

Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.

E.  

Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on 53 events that the SQS queue receives.

Discussion 0
Questions 3

A company uses Amazon S3 to store data and Amazon QuickSight to create visualizations.

The company has an S3 bucket in an AWS account named Hub-Account. The S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key. The company's QuickSight instance is in a separate account named BI-Account

The company updates the S3 bucket policy to grant access to the QuickSight service role. The company wants to enable cross-account access to allow QuickSight to interact with the S3 bucket.

Which combination of steps will meet this requirement? (Select TWO.)

Options:

A.  

Use the existing AWS KMS key to encrypt connections from QuickSight to the S3 bucket.

B.  

Add the 53 bucket as a resource that the QuickSight service role can access.

C.  

Use AWS Resource Access Manager (AWS RAM) to share the S3 bucket with the Bl-Account account.

D.  

Add an IAM policy to the QuickSight service role to give QuickSight access to the KMS key that encrypts the S3 bucket.

E.  

Add the KMS key as a resource that the QuickSight service role can access.

Discussion 0
Questions 4

A retail company has a customer data hub in an Amazon S3 bucket. Employees from many countries use the data hub to support company-wide analytics. A governance team must ensure that the company's data analysts can access data only for customers who are within the same country as the analysts.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.  

Create a separate table for each country's customer data. Provide access to each analyst based on the country that the analyst serves.

B.  

Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company's access policies.

C.  

Move the data to AWS Regions that are close to the countries where the customers are. Provide access to each analyst based on the country that the analyst serves.

D.  

Load the data into Amazon Redshift. Create a view for each country. Create separate 1AM roles for each country to provide access to data from each country. Assign the appropriate roles to the analysts.

Discussion 0
Questions 5

A data engineer needs to create a new empty table in Amazon Athena that has the same schema as an existing table named old-table.

Which SQL statement should the data engineer use to meet this requirement?

Options:

A.  

B.  

C.  

D.  

Discussion 0
Questions 6

A data engineer needs to create an empty copy of an existing table in Amazon Athena to perform data processing tasks. The existing table in Athena contains 1,000 rows.

Which query will meet this requirement?

Options:

A.  

CREATE TABLE new_table LIKE old_table;

B.  

CREATE TABLE new_table AS SELECT * FROM old_table WITH NO DATA;

C.  

CREATE TABLE new_table AS SELECT * FROM old_table;

D.  

CREATE TABLE new_table AS SELECT * FROM old_table WHERE 1=1;

Discussion 0
Questions 7

A company stores details about transactions in an Amazon S3 bucket. The company wants to log all writes to the S3 bucket into another S3 bucket that is in the same AWS Region.

Which solution will meet this requirement with the LEAST operational effort?

Options:

A.  

Configure an S3 Event Notifications rule for all activities on the transactions S3 bucket to invoke an AWS Lambda function. Program the Lambda function to write the event to Amazon Kinesis Data Firehose. Configure Kinesis Data Firehose to write the event to the logs S3 bucket.

B.  

Create a trail of management events in AWS CloudTraiL. Configure the trail to receive data from the transactions S3 bucket. Specify an empty prefix and write-only events. Specify the logs S3 bucket as the destination bucket.

C.  

Configure an S3 Event Notifications rule for all activities on the transactions S3 bucket to invoke an AWS Lambda function. Program the Lambda function to write the events to the logs S3 bucket.

D.  

Create a trail of data events in AWS CloudTraiL. Configure the trail to receive data from the transactions S3 bucket. Specify an empty prefix and write-only events. Specify the logs S3 bucket as the destination bucket.

Discussion 0
Questions 8

A company wants to analyze sales records that the company stores in a MySQL database. The company wants to correlate the records with sales opportunities identified by Salesforce.

The company receives 2 GB erf sales records every day. The company has 100 GB of identified sales opportunities. A data engineer needs to develop a process that will analyze and correlate sales records and sales opportunities. The process must run once each night.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to fetch both datasets. Use AWS Lambda functions to correlate the datasets. Use AWS Step Functions to orchestrate the process.

B.  

Use Amazon AppFlow to fetch sales opportunities from Salesforce. Use AWS Glue to fetch sales records from the MySQL database. Correlate the sales records with the sales opportunities. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the process.

C.  

Use Amazon AppFlow to fetch sales opportunities from Salesforce. Use AWS Glue to fetch sales records from the MySQL database. Correlate the sales records with sales opportunities. Use AWS Step Functions to orchestrate the process.

D.  

Use Amazon AppFlow to fetch sales opportunities from Salesforce. Use Amazon Kinesis Data Streams to fetch sales records from the MySQL database. Use Amazon Managed Service for Apache Flink to correlate the datasets. Use AWS Step Functions to orchestrate the process.

Discussion 0
Questions 9

A company uses Amazon Athena to run SQL queries for extract, transform, and load (ETL) tasks by using Create Table As Select (CTAS). The company must use Apache Spark instead of SQL to generate analytics.

Which solution will give the company the ability to use Spark to access Athena?

Options:

A.  

Athena query settings

B.  

Athena workgroup

C.  

Athena data source

D.  

Athena query editor

Discussion 0
Questions 10

A company has a production AWS account that runs company workloads. The company's security team created a security AWS account to store and analyze security logs from the production AWS account. The security logs in the production AWS account are stored in Amazon CloudWatch Logs.

The company needs to use Amazon Kinesis Data Streams to deliver the security logs to the security AWS account.

Which solution will meet these requirements?

Options:

A.  

Create a destination data stream in the production AWS account. In the security AWS account, create an IAM role that has cross-account permissions to Kinesis Data Streams in the production AWS account.

B.  

Create a destination data stream in the security AWS account. Create an IAM role and a trust policy to grant CloudWatch Logs the permission to put data into the stream. Create a subscription filter in the security AWS account.

C.  

Create a destination data stream in the production AWS account. In the production AWS account, create an IAM role that has cross-account permissions to Kinesis Data Streams in the security AWS account.

D.  

Create a destination data stream in the security AWS account. Create an IAM role and a trust policy to grant CloudWatch Logs the permission to put data into the stream. Create a subscription filter in the production AWS account.

Discussion 0
Questions 11

A company stores server logs in an Amazon 53 bucket. The company needs to keep the logs for 1 year. The logs are not required after 1 year.

A data engineer needs a solution to automatically delete logs that are older than 1 year.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Define an S3 Lifecycle configuration to delete the logs after 1 year.

B.  

Create an AWS Lambda function to delete the logs after 1 year.

C.  

Schedule a cron job on an Amazon EC2 instance to delete the logs after 1 year.

D.  

Configure an AWS Step Functions state machine to delete the logs after 1 year.

Discussion 0
Questions 12

An ecommerce company wants to use AWS to migrate data pipelines from an on-premises environment into the AWS Cloud. The company currently uses a third-party too in the on-premises environment to orchestrate data ingestion processes.

The company wants a migration solution that does not require the company to manage servers. The solution must be able to orchestrate Python and Bash scripts. The solution must not require the company to refactor any code.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

AWS Lambda

B.  

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

C.  

AWS Step Functions

D.  

AWS Glue

Discussion 0
Questions 13

A company is developing an application that runs on Amazon EC2 instances. Currently, the data that the application generates is temporary. However, the company needs to persist the data, even if the EC2 instances are terminated.

A data engineer must launch new EC2 instances from an Amazon Machine Image (AMI) and configure the instances to preserve the data.

Which solution will meet this requirement?

Options:

A.  

Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume that contains the application data. Apply the default settings to the EC2 instances.

B.  

Launch new EC2 instances by using an AMI that is backed by a root Amazon Elastic Block Store (Amazon EBS) volume that contains the application data. Apply the default settings to the EC2 instances.

C.  

Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume. Attach an Amazon Elastic Block Store (Amazon EBS) volume to contain the application data. Apply the default settings to the EC2 instances.

D.  

Launch new EC2 instances by using an AMI that is backed by an Amazon Elastic Block Store (Amazon EBS) volume. Attach an additional EC2 instance store volume to contain the application data. Apply the default settings to the EC2 instances.

Discussion 0
Questions 14

A company has a data processing pipeline that includes several dozen steps. The data processing pipeline needs to send alerts in real time when a step fails or succeeds. The data processing pipeline uses a combination of Amazon S3 buckets, AWS Lambda functions, and AWS Step Functions state machines.

A data engineer needs to create a solution to monitor the entire pipeline.

Which solution will meet these requirements?

Options:

A.  

Configure the Step Functions state machines to store notifications in an Amazon S3 bucket when the state machines finish running. Enable S3 event notifications on the S3 bucket.

B.  

Configure the AWS Lambda functions to store notifications in an Amazon S3 bucket when the state machines finish running. Enable S3 event notifications on the S3 bucket.

C.  

Use AWS CloudTrail to send a message to an Amazon Simple Notification Service (Amazon SNS) topic that sends notifications when a state machine fails to run or succeeds to run.

D.  

Configure an Amazon EventBridge rule to react when the execution status of a state machine changes. Configure the rule to send a message to an Amazon Simple Notification Service (Amazon SNS) topic that sends notifications.

Discussion 0
Questions 15

A company uses Amazon Redshift as a data warehouse solution. One of the datasets that the company stores in Amazon Redshift contains data for a vendor.

Recently, the vendor asked the company to transfer the vendor's data into the vendor's Amazon S3 bucket once each week.

Which solution will meet this requirement?

Options:

A.  

Create an AWS Lambda function to connect to the Redshift data warehouse. Configure the Lambda function to use the Redshift COPY command to copy the required data to the vendor's S3 bucket on a schedule.

B.  

Create an AWS Glue job to connect to the Redshift data warehouse. Configure the AWS Glue job to use the Redshift UNLOAD command to load the required data to the vendor's S3 bucket on a schedule.

C.  

Use the Amazon Redshift data sharing feature. Set the vendor's S3 bucket as the destination. Configure the source to be as a custom SQL query that selects the required data.

D.  

Configure Amazon Redshift Spectrum to use the vendor's S3 bucket as destination. Enable dataquerying in both directions.

Discussion 0
Questions 16

A company is using an AWS Transfer Family server to migrate data from an on-premises environment to AWS. Company policy mandates the use of TLS 1.2 or above to encrypt the data in transit.

Which solution will meet these requirements?

Options:

A.  

Generate new SSH keys for the Transfer Family server. Make the old keys and the new keys available for use.

B.  

Update the security group rules for the on-premises network to allow only connections that use TLS 1.2 or above.

C.  

Update the security policy of the Transfer Family server to specify a minimum protocol version of TLS 1.2.

D.  

Install an SSL certificate on the Transfer Family server to encrypt data transfers by using TLS 1.2.

Discussion 0
Questions 17

A company has three subsidiaries. Each subsidiary uses a different data warehousing solution. The first subsidiary hosts its data warehouse in Amazon Redshift. The second subsidiary uses Teradata Vantage on AWS. The third subsidiary uses Google BigQuery.

The company wants to aggregate all the data into a central Amazon S3 data lake. The company wants to use Apache Iceberg as the table format.

A data engineer needs to build a new pipeline to connect to all the data sources, run transformations by using each source engine, join the data, and write the data to Iceberg.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.  

Use native Amazon Redshift, Teradata, and BigQuery connectors to build the pipeline in AWS Glue. Use native AWS Glue transforms to join the data. Run a Merge operation on the data lake Iceberg table.

B.  

Use the Amazon Athena federated query connectors for Amazon Redshift, Teradata, and BigQuery to build the pipeline in Athena. Write a SQL query to read from all the data sources, join the data, and run a Merge operation on the data lake Iceberg table.

C.  

Use the native Amazon Redshift connector, the Java Database Connectivity (JDBC) connector for Teradata, and the open source Apache Spark BigQuery connector to build the pipeline in Amazon EMR. Write code in PySpark to join the data. Run a Merge operation on the data lake Iceberg table.

D.  

Use the native Amazon Redshift, Teradata, and BigQuery connectors in Amazon Appflow to write data to Amazon S3 and AWS Glue Data Catalog. Use Amazon Athena to join the data. Run a Merge operation on the data lake Iceberg table.

Discussion 0
Questions 18

An ecommerce company processes millions of orders each day. The company uses AWS Glue ETL to collect data from multiple sources, clean the data, and store the data in an Amazon S3 bucket in CSV format by using the S3 Standard storage class. The company uses the stored data to conduct daily analysis.

The company wants to optimize costs for data storage and retrieval.

Which solution will meet this requirement?

Options:

A.  

Transition the data to Amazon S3 Glacier Flexible Retrieval.

B.  

Transition the data from Amazon S3 to an Amazon Aurora cluster.

C.  

Configure AWS Glue ETL to transform the incoming data to Apache Parquet format.

D.  

Configure AWS Glue ETL to use Amazon EMR to process incoming data in parallel.

Discussion 0
Questions 19

A company is building an inventory management system and an inventory reordering system to automatically reorder products. Both systems use Amazon Kinesis Data Streams. The inventory management system uses the Amazon Kinesis Producer Library (KPL) to publish data to a stream. The inventory reordering system uses the Amazon Kinesis Client Library (KCL) to consume data from the stream. The company configures the stream to scale up and down as needed.

Before the company deploys the systems to production, the company discovers that the inventory reordering system received duplicated data.

Which factors could have caused the reordering system to receive duplicated data? (Select TWO.)

Options:

A.  

The producer experienced network-related timeouts.

B.  

The stream's value for the IteratorAgeMilliseconds metric was too high.

C.  

There was a change in the number of shards, record processors, or both.

D.  

The AggregationEnabled configuration property was set to true.

E.  

The max_records configuration property was set to a number that was too high.

Discussion 0
Questions 20

A mobile gaming company wants to capture data from its gaming app. The company wants to make the data available to three internal consumers of the data. The data records are approximately 20 KB in size.

The company wants to achieve optimal throughput from each device that runs the gaming app. Additionally, the company wants to develop an application to process data streams. The stream-processing application must have dedicated throughput for each internal consumer.

Which solution will meet these requirements?

Options:

A.  

Configure the mobile app to call the PutRecords API operation to send data to Amazon Kinesis Data Streams. Use the enhanced fan-out feature with a stream for each internal consumer.

B.  

Configure the mobile app to call the PutRecordBatch API operation to send data to Amazon Data Firehose. Submit an AWS Support case to turn on dedicated throughput for the company's AWS account. Allow each internal consumer to access the stream.

C.  

Configure the mobile app to use the Amazon Kinesis Producer Library (KPL) to send data to Amazon Data Firehose. Use the enhanced fan-out feature with a stream for each internal consumer.

D.  

Configure the mobile app to call the PutRecords API operation to send data to Amazon Kinesis Data Streams. Host the stream-processing application for each internal consumer on Amazon EC2 instances. Configure auto scaling for the EC2 instances.

Discussion 0
Questions 21

A data engineer has two datasets that contain sales information for multiple cities and states. One dataset is named reference, and the other dataset is named primary.

The data engineer needs a solution to determine whether a specific set of values in the city and state columns of the primary dataset exactly match the same specific values in the reference dataset. The data engineer wants to useData Quality Definition Language (DQDL)rules in an AWS Glue Data Quality job.

Which rule will meet these requirements?

Options:

A.  

DatasetMatch "reference" "city->ref_city, state->ref_state" = 1.0

B.  

ReferentialIntegrity "city,state" "reference.{ref_city,ref_state}" = 1.0

C.  

DatasetMatch "reference" "city->ref_city, state->ref_state" = 100

D.  

ReferentialIntegrity "city,state" "reference.{ref_city,ref_state}" = 100

Discussion 0
Questions 22

A company needs to partition the Amazon S3 storage that the company uses for a data lake. The partitioning will use a path of the S3 object keys in the following format: s3://bucket/prefix/year=2023/month=01/day=01.

A data engineer must ensure that the AWS Glue Data Catalog synchronizes with the S3 storage when the company adds new partitions to the bucket.

Which solution will meet these requirements with the LEAST latency?

Options:

A.  

Schedule an AWS Glue crawler to run every morning.

B.  

Manually run the AWS Glue CreatePartition API twice each day.

C.  

Use code that writes data to Amazon S3 to invoke the Boto3 AWS Glue create partition API call.

D.  

Run the MSCK REPAIR TABLE command from the AWS Glue console.

Discussion 0
Questions 23

A company stores employee data in Amazon Redshift A table named Employee uses columns named Region ID, Department ID, and Role ID as a compound sort key. Which queries will MOST increase the speed of a query by using a compound sort key of the table? (Select TWO.)

Options:

A.  

Select * from Employee where Region ID='North America';

B.  

Select * from Employee where Region ID='North America' and Department ID=20;

C.  

Select * from Employee where Department ID=20 and Region ID='North America';

D.  

Select " from Employee where Role ID=50;

E.  

Select * from Employee where Region ID='North America' and Role ID=50;

Discussion 0
Questions 24

A company uses Amazon S3 buckets, AWS Glue tables, and Amazon Athena as components of a data lake. Recently, the company expanded its sales range to multiple new states. The company wants to introduce state names as a new partition to the existing S3 bucket, which is currently partitioned by date.

The company needs to ensure that additional partitions will not disrupt daily synchronization between the AWS Glue Data Catalog and the S3 buckets.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use the AWS Glue API to manually update the Data Catalog.

B.  

Run an MSCK REPAIR TABLE command in Athena.

C.  

Schedule an AWS Glue crawler to periodically update the Data Catalog.

D.  

Run a REFRESH TABLE command in Athena.

Discussion 0
Questions 25

A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically.

Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?

Options:

A.  

AWS DataSync

B.  

AWS Glue

C.  

AWS Direct Connect

D.  

Amazon S3 Transfer Acceleration

Discussion 0
Questions 26

A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB. The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files.

The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline.

The company needs to improve the performance of the second pipeline.

Which solution will meet this requirement MOST cost-effectively?

Options:

A.  

Use a larger worker type.

B.  

Increase the number of workers in the AWS Glue ETL jobs.

C.  

Use the AWS Glue DynamicFrame grouping option.

D.  

Enable AWS Glue auto scaling.

Discussion 0
Questions 27

A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of tiles into a tact table that is in a Redshift cluster.

The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the tact table.

Which solution will meet these requirements?

Options:

A.  

Use multiple COPY commands to load the data into the Redshift cluster.

B.  

Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.

C.  

Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.

D.  

Use a single COPY command to load the data into the Redshift cluster.

Discussion 0
Questions 28

A company stores customer data that contains personally identifiable information (PII) in an Amazon Redshift cluster. The company's marketing, claims, and analytics teams need to be able to access the customer data.

The marketing team should have access to obfuscated claim information but should have full access to customer contact information.

The claims team should have access to customer information for each claim that the team processes.

The analytics team should have access only to obfuscated PII data.

Which solution will enforce these data access requirements with the LEAST administrative overhead?

Options:

A.  

Create a separate Redshift cluster for each team. Load only the required data for each team. Restrict access to clusters based on the teams.

B.  

Create views that include required fields for each of the data requirements. Grant the teams access only to the view that each team requires.

C.  

Create a separate Amazon Redshift database role for each team. Define masking policies that apply for each team separately. Attach appropriate masking policies to each team role.

D.  

Move the customer data to an Amazon S3 bucket. Use AWS Lake Formation to create a data lake. Use fine-grained security capabilities to grant each team appropriate permissions to access the data.

Discussion 0
Questions 29

A retail company is using an Amazon Redshift cluster to support real-time inventory management. The company has deployed an ML model on a real-time endpoint in Amazon SageMaker.

The company wants to make real-time inventory recommendations. The company also wants to make predictions about future inventory needs.

Which solutions will meet these requirements? (Select TWO.)

Options:

A.  

Use Amazon Redshift ML to generate inventory recommendations.

B.  

Use SQL to invoke a remote SageMaker endpoint for prediction.

C.  

Use Amazon Redshift ML to schedule regular data exports for offline model training.

D.  

Use SageMaker Autopilot to create inventory management dashboards in Amazon Redshift.

E.  

Use Amazon Redshift as a file storage system to archive old inventory management reports.

Discussion 0
Questions 30

A company's data engineer needs to optimize the performance of table SQL queries. The company stores data in an Amazon Redshift cluster. The data engineer cannot increase the size of the cluster because of budget constraints.

The company stores the data in multiple tables and loads the data by using the EVEN distribution style. Some tables are hundreds of gigabytes in size. Other tables are less than 10 MB in size.

Which solution will meet these requirements?

Options:

A.  

Keep using the EVEN distribution style for all tables. Specify primary and foreign keys for all tables.

B.  

Use the ALL distribution style for large tables. Specify primary and foreign keys for all tables.

C.  

Use the ALL distribution style for rarely updated small tables. Specify primary and foreign keys for all tables.

D.  

Specify a combination of distribution, sort, and partition keys for all tables.

Discussion 0
Questions 31

A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks.

The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team's BI cluster.

The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster.

Which solution will meet these requirements?

Options:

A.  

Set up the sales team Bl cluster as a consumer of the ETL cluster by using Redshift data sharing.

B.  

Create materialized views based on the sales team's requirements. Grant the sales team direct access to the ETL cluster.

C.  

Create database views based on the sales team's requirements. Grant the sales team direct access to the ETL cluster.

D.  

Unload a copy of the data from the ETL cluster to an Amazon S3 bucket every week. Create an Amazon Redshift Spectrum table based on the content of the ETL cluster.

Discussion 0
Questions 32

A retail company uses an Amazon Redshift data warehouse and an Amazon S3 bucket. The company ingests retail order data into the S3 bucket every day.

The company stores all order data at a single path within the S3 bucket. The data has more than 100 columns. The company ingests the order data from a third-party application that generates more than 30 files in CSV format every day. Each CSV file is between 50 and 70 MB in size.

The company uses Amazon Redshift Spectrum to run queries that select sets of columns. Users aggregate metrics based on daily orders. Recently, users have reported that the performance of the queries has degraded. A data engineer must resolve the performance issues for the queries.

Which combination of steps will meet this requirement with LEAST developmental effort? (Select TWO.)

Options:

A.  

Configure the third-party application to create the files in a columnar format.

B.  

Develop an AWS Glue ETL job to convert the multiple daily CSV files to one file for each day.

C.  

Partition the order data in the S3 bucket based on order date.

D.  

Configure the third-party application to create the files in JSON format.

E.  

Load the JSON data into the Amazon Redshift table in a SUPER type column.

Discussion 0
Questions 33

A data engineer needs to maintain a central metadata repository that users access through Amazon EMR and Amazon Athena queries. The repository needs to provide the schema and properties of many tables. Some of the metadata is stored in Apache Hive. The data engineer needs to import the metadata from Hive into the central metadata repository.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.  

Use Amazon EMR and Apache Ranger.

B.  

Use a Hive metastore on an EMR cluster.

C.  

Use the AWS Glue Data Catalog.

D.  

Use a metastore on an Amazon RDS for MySQL DB instance.

Discussion 0
Questions 34

A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling.

Which solution will meet this requirement?

Options:

A.  

Turn on concurrency scaling in workload management (WLM) for Redshift Serverless workgroups.

B.  

Turn on concurrency scaling at the workload management (WLM) queue level in the Redshift cluster.

C.  

Turn on concurrency scaling in the settings during the creation of and new Redshift cluster.

D.  

Turn on concurrency scaling for the daily usage quota for the Redshift cluster.

Discussion 0
Questions 35

A company currently uses a provisioned Amazon EMR cluster that includes general purpose Amazon EC2 instances. The EMR cluster uses EMR managed scaling betweenone to five task nodes for the company's long-running Apache Spark extract, transform, and load (ETL) job. The company runs the ETL job every day.

When the company runs the ETL job, the EMR cluster quickly scales up to five nodes. The EMR cluster often reaches maximum CPU usage, but the memory usage remains under 30%.

The company wants to modify the EMR cluster configuration to reduce the EMR costs to run the daily ETL job.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.  

Increase the maximum number of task nodes for EMR managed scaling to 10.

B.  

Change the task node type from general purpose EC2 instances to memory optimized EC2 instances.

C.  

Switch the task node type from general purpose EC2 instances to compute optimized EC2 instances.

D.  

Reduce the scaling cooldown period for the provisioned EMR cluster.

Discussion 0
Questions 36

A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the data files are small, but other data files are tens of terabytes.

A data engineer must perform a change data capture (CDC) operation to identify changed data from the data source. The data source sends a full snapshot as a JSON file every day and ingests the changed data into the data lake.

Which solution will capture the changed data MOST cost-effectively?

Options:

A.  

Create an AWS Lambda function to identify the changes between the previous data and the current data. Configure the Lambda function to ingest the changes into the data lake.

B.  

Ingest the data into Amazon RDS for MySQL. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.

C.  

Use an open source data lake format to merge the data source with the S3 data lake to insert the new data and update the existing data.

D.  

Ingest the data into an Amazon Aurora MySQL DB instance that runs Aurora Serverless. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.

Discussion 0
Questions 37

A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real-time analytics. The company's application uses the PutRecord action to send data to Kinesis Data Streams.

A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline.

Which solution will meet this requirement?

Options:

A.  

Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.

B.  

Update the checkpoint configuration of the Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) data collection application to avoid duplicate processing of events.

C.  

Design the data source so events are not ingested into Kinesis Data Streams multiple times.

D.  

Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache Spark Streaming in Amazon EMR.

Discussion 0
Questions 38

A company needs to load customer data that comes from a third party into an Amazon Redshift data warehouse. The company stores order data and product data in the same data warehouse. The company wants to use the combined dataset to identify potential new customers.

A data engineer notices that one of the fields in the source data includes values that are in JSON format.

How should the data engineer load the JSON data into the data warehouse with the LEAST effort?

Options:

A.  

Use the SUPER data type to store the data in the Amazon Redshift table.

B.  

Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table.

C.  

Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data.

D.  

Use an AWS Lambda function to flatten the JSON data. Store the data in Amazon S3.

Discussion 0
Questions 39

A company saves customer data to an Amazon S3 bucket. The company uses server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the bucket. The dataset includes personally identifiable information (PII) such as social security numbers and account details.

Data that is tagged as PII must be masked before the company uses customer data for analysis. Some users must have secure access to the PII data during the preprocessing phase. The company needs a low-maintenance solution to mask and secure the PII data throughout the entire engineering pipeline.

Which combination of solutions will meet these requirements? (Select TWO.)

Options:

A.  

Use AWS Glue DataBrew to perform extract, transform, and load (ETL) tasks that mask the PII data before analysis.

B.  

Use Amazon GuardDuty to monitor access patterns for the PII data that is used in the engineering pipeline.

C.  

Configure an Amazon Made discovery job for the S3 bucket.

D.  

Use AWS Identity and Access Management (IAM) to manage permissions and to control access to the PII data.

E.  

Write custom scripts in an application to mask the PII data and to control access.

Discussion 0
Questions 40

A company uses Amazon RDS for MySQL as the database for a critical application. The database workload is mostly writes, with a small number of reads.

A data engineer notices that the CPU utilization of the DB instance is very high. The high CPU utilization is slowing down the application. The data engineer must reduce the CPU utilization of the DB Instance.

Which actions should the data engineer take to meet this requirement? (Choose two.)

Options:

A.  

Use the Performance Insights feature of Amazon RDS to identify queries that have high CPU utilization. Optimize the problematic queries.

B.  

Modify the database schema to include additional tables and indexes.

C.  

Reboot the RDS DB instance once each week.

D.  

Upgrade to a larger instance size.

E.  

Implement caching to reduce the database query load.

Discussion 0
Questions 41

A company stores logs in an Amazon S3 bucket. When a data engineer attempts to access several log files, the data engineer discovers that some files have been unintentionally deleted.

The data engineer needs a solution that will prevent unintentional file deletion in the future.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

A.  

Manually back up the S3 bucket on a regular basis.

B.  

Enable S3 Versioning for the S3 bucket.

C.  

Configure replication for the S3 bucket.

D.  

Use an Amazon S3 Glacier storage class to archive the data that is in the S3 bucket.

Discussion 0
Questions 42

A company is migrating on-premises workloads to AWS. The company wants to reduce overall operational overhead. The company also wants to explore serverless options.

The company's current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must maintain similar or better performance after the migration to AWS.

Which extract, transform, and load (ETL) service will meet these requirements?

Options:

A.  

AWS Glue

B.  

Amazon EMR

C.  

AWS Lambda

D.  

Amazon Redshift

Discussion 0
Questions 43

A company has an application that uses an Amazon API Gateway REST API and an AWS Lambda function to retrieve data from an Amazon DynamoDB instance. Users recently reported intermittent high latency in the application's response times. A data engineer finds that the Lambda function experiences frequent throttling when the company's other Lambda functions experience increased invocations.

The company wants to ensure the API's Lambda function operates without being affected by other Lambda functions.

Which solution will meet this requirement MOST cost-effectively?

Options:

A.  

Increase the number of read capacity unit (RCU) in DynamoDB.

B.  

Configure provisioned concurrency for the Lambda function.

C.  

Configure reserved concurrency for the Lambda function.

D.  

Increase the Lambda function timeout and allocated memory.

Discussion 0
Questions 44

A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently.

The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database.

Which AWS service should the company use to meet these requirements?

Options:

A.  

AWS Lambda

B.  

AWS Database Migration Service (AWS DMS)

C.  

AWS Direct Connect

D.  

AWS DataSync

Discussion 0
Questions 45

A company stores petabytes of data in thousands of Amazon S3 buckets in the S3 Standard storage class. The data supports analytics workloads that have unpredictable and variable data access patterns.

The company does not access some data for months. However, the company must be able to retrieve all data within milliseconds. The company needs to optimize S3 storage costs.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use S3 Storage Lens standard metrics to determine when to move objects to more cost-optimized storage classes. Create S3 Lifecycle policies for the S3 buckets to move objects to cost-optimized storage classes. Continue to refine the S3 Lifecycle policies in the future to optimize storage costs.

B.  

Use S3 Storage Lens activity metrics to identify S3 buckets that the company accesses infrequently. Configure S3 Lifecycle rules to move objects from S3 Standard to the S3 Standard-Infrequent Access (S3 Standard-IA) and S3 Glacier storage classes based on the age of the data.

C.  

Use S3 Intelligent-Tiering. Activate the Deep Archive Access tier.

D.  

Use S3 Intelligent-Tiering. Use the default access tier.

Discussion 0
Questions 46

A media company uses software as a service (SaaS) applications to gather data by using third-party tools. The company needs to store the data in an Amazon S3 bucket. The company will use Amazon Redshift to perform analytics based on the data.

Which AWS service or feature will meet these requirements with the LEAST operational overhead?

Options:

A.  

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

B.  

Amazon AppFlow

C.  

AWS Glue Data Catalog

D.  

Amazon Kinesis

Discussion 0
Questions 47

A company is planning to upgrade its Amazon Elastic Block Store (Amazon EBS) General Purpose SSD storage from gp2 to gp3. The company wants to prevent any interruptions in its Amazon EC2 instances that will cause data loss during the migration to the upgraded storage.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Create snapshots of the gp2 volumes. Create new gp3 volumes from the snapshots. Attach the new gp3 volumes to the EC2 instances.

B.  

Create new gp3 volumes. Gradually transfer the data to the new gp3 volumes. When the transfer is complete, mount the new gp3 volumes to the EC2 instances to replace the gp2 volumes.

C.  

Change the volume type of the existing gp2 volumes to gp3. Enter new values for volume size, IOPS, and throughput.

D.  

Use AWS DataSync to create new gp3 volumes. Transfer the data from the original gp2 volumes to the new gp3 volumes.

Discussion 0
Questions 48

A company receives test results from testing facilities that are located around the world. The company stores the test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data engineer needs to process the files, convert them into Apache Parquet format, and load them into Amazon Redshift tables. The data engineer uses AWS Glue to process the files, AWS Step Functions to orchestrate the processes, and Amazon EventBridge to schedule jobs.

The company recently added more testing facilities. The time required to process files is increasing. The data engineer must reduce the data processing time.

Which solution will MOST reduce the data processing time?

Options:

A.  

Use AWS Lambda to group the raw input files into larger files. Write the larger files back to Amazon S3. Use AWS Glue to process the files. Load the files into the Amazon Redshift tables.

B.  

Use the AWS Glue dynamic frame file-grouping option to ingest the raw input files. Process the files. Load the files into the Amazon Redshift tables.

C.  

Use the Amazon Redshift COPY command to move the raw input files from Amazon S3 directly into the Amazon Redshift tables. Process the files in Amazon Redshift.

D.  

Use Amazon EMR instead of AWS Glue to group the raw input files. Process the files in Amazon EMR. Load the files into the Amazon Redshift tables.

Discussion 0
Questions 49

A company created an extract, transform, and load (ETL) data pipeline in AWS Glue. A data engineer must crawl a table that is in Microsoft SQL Server. The data engineer needs to extract, transform, and load the output of the crawl to an Amazon S3 bucket. The data engineer also must orchestrate the data pipeline.

Which AWS service or feature will meet these requirements MOST cost-effectively?

Options:

A.  

AWS Step Functions

B.  

AWS Glue workflows

C.  

AWS Glue Studio

D.  

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

Discussion 0
Questions 50

A data engineer uses Amazon Redshift to run resource-intensive analytics processes once every month. Every month, the data engineer creates a new Redshift provisioned cluster. The data engineer deletes the Redshift provisioned cluster after the analytics processes are complete every month. Before the data engineer deletes the cluster each month, the data engineer unloads backup data from the cluster to an Amazon S3 bucket.

The data engineer needs a solution to run the monthly analytics processes that does not require the data engineer to manage the infrastructure manually.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use Amazon Step Functions to pause the Redshift cluster when the analytics processes are complete and to resume the cluster to run new processes every month.

B.  

Use Amazon Redshift Serverless to automatically process the analytics workload.

C.  

Use the AWS CLI to automatically process the analytics workload.

D.  

Use AWS CloudFormation templates to automatically process the analytics workload.

Discussion 0
Questions 51

A company loads transaction data for each day into Amazon Redshift tables at the end of each day. The company wants to have the ability to track which tables have been loaded and which tables still need to be loaded.

A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table. The data engineer creates an AWS Lambda function to publish the details of the load statuses to DynamoDB.

How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB table?

Options:

A.  

Use a second Lambda function to invoke the first Lambda function based on Amazon CloudWatch events.

B.  

Use the Amazon Redshift Data API to publish an event to Amazon EventBridqe. Configure an EventBridge rule to invoke the Lambda function.

C.  

Use the Amazon Redshift Data API to publish a message to an Amazon Simple Queue Service (Amazon SQS) queue. Configure the SQS queue to invoke the Lambda function.

D.  

Use a second Lambda function to invoke the first Lambda function based on AWS CloudTrail events.

Discussion 0
Questions 52

A company wants to implement real-time analytics capabilities. The company wants to use Amazon Kinesis Data Streams and Amazon Redshift to ingest and process streaming data at the rate of several gigabytes per second. The company wants to derive near real-time insights by using existing business intelligence (BI) and analytics tools.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use Kinesis Data Streams to stage data in Amazon S3. Use the COPY command to load data from Amazon S3 directly into Amazon Redshift to make the data immediately available for real-time analysis.

B.  

Access the data from Kinesis Data Streams by using SQL queries. Create materialized views directly on top of the stream. Refresh the materialized views regularly to query the most recent stream data.

C.  

Create an external schema in Amazon Redshift to map the data from Kinesis Data Streams to an Amazon Redshift object. Create a materialized view to read data from the stream. Set the materialized view to auto refresh.

D.  

Connect Kinesis Data Streams to Amazon Kinesis Data Firehose. Use Kinesis Data Firehose to stage the data in Amazon S3. Use the COPY command to load the data from Amazon S3 to a table in Amazon Redshift.

Discussion 0