Name: 最新有效的Data-Engineer-Associate認證考試培訓材料（290題） - 免费的Data-Engineer-Associate部分試題下載
Brand: Amazon
SKU: VSA4A4024EABCC4B14
Price: 59.98 USD
Availability: InStock
Rating: 4.9 (1088 reviews)

最新的 AWS Certified Data Engineer Data-Engineer-Associate 免費考試真題:

1. A company analyzes data in a data lake every quarter to perform inventory assessments. A data engineer uses AWS Glue DataBrew to detect any personally identifiable information (PII) about customers within the data.
The company ' s privacy policy considers some custom categories of information to be PII. However, the categories are not included in standard DataBrew data quality rules.
The data engineer needs to modify the current process to scan for the custom PII categories across multiple datasets within the data lake.
Which solution will meet these requirements with the LEAST operational overhead?

A) Implement custom data quality rules in Data Brew. Apply the custom rules across datasets.
B) Develop custom Python scripts to detect the custom PII categories. Call the scripts from DataBrew.
C) Manually review the data for custom PII categories.
D) Implement regex patterns to extract PII information from fields during extract transform, and load (ETL) operations into the data lake.

2. A data engineer is processing a large amount of log data from web servers. The data is stored in an Amazon S3 bucket. The data engineer uses AWS services to process the data every day. The data engineer needs to extract specific fields from the raw log data and load the data into a data warehouse for analysis.

A) Use Amazon EMR to run Apache Hive queries on the raw log files in the S3 bucket to extract the specified fields. Store the output as ORC files in the original S3 bucket.
B) Use AWS Glue DataBrew to run AWS Glue ETL jobs on a schedule to extract the specified fields from the raw log files in the S3 bucket. Load the data into partitioned tables in Amazon Redshift.
C) Use an AWS Glue crawler to parse the raw log data in the S3 bucket and to generate a schema. Use AWS Glue ETL jobs to extract and transform the data and to load it into Amazon Redshift.
D) Use AWS Step Functions to orchestrate a series of AWS Batch jobs to parse the raw log files. Load the specified fields into an Amazon RDS for PostgreSQL database.

3. A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.
The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.
Which solution will meet these requirements with the LEAST operational overhead?

A) Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the Aurora data catalog. Schedule the Lambda functions to run periodically.
B) Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the DynamoDB data catalog. Schedule the Lambda functions to run periodically.
C) Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for Amazon RDS and Amazon Redshift sources, and build the Data Catalog. Use AWS Glue crawlers for data that is in Amazon S3 to infer the schema and to automatically update the Data Catalog.
D) Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and to update the Data Catalog with metadata changes. Schedule the crawlers to run periodically to update the metadata catalog.

4. A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB.
The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files.
The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline.
The company needs to improve the performance of the second pipeline.
Which solution will meet this requirement MOST cost-effectively?

A) Enable AWS Glue auto scaling.
B) Increase the number of workers in the AWS Glue ETL jobs.
C) Use a larger worker type.
D) Use the AWS Glue DynamicFrame grouping option.

5. A data engineer is building an automated extract, transform, and load (ETL) ingestion pipeline by using AWS Glue. The pipeline ingests compressed files that are in an Amazon S3 bucket. The ingestion pipeline must support incremental data processing.
Which AWS Glue feature should the data engineer use to meet this requirement?

A) Classifiers
B) Triggers
C) Job bookmarks
D) Workflows

問題與答案：

問題 #1
答案： A

問題 #2
答案： C

問題 #3
答案： D

問題 #4
答案： A

問題 #5
答案： C