A data engineer is configuring an AWS Glue Apache Spark extract, transform, and load (ETL) job. The jobcontains a sort-merge join of two large and equally sized DataFrames.The job is failing with the following error: No space left on device.Which solution will resolve the error?
A company receives test results from testing facilities that are located around the world. The company storesthe test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data engineer needs to process thefiles, convert them into Apache Parquet format, and load them into Amazon Redshift tables. The dataengineer uses AWS Glue to process the files, AWS Step Functions to orchestrate the processes, and AmazonEventBridge to schedule jobs.The company recently added more testing facilities. The time required to process files is increasing. The dataengineer must reduce the data processing time.Which solution will MOST reduce the data processing time?
A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The
AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB.
The company wants to build a sales prediction model by using data from the previous 5 years. The historic
data includes 44,000 files.
The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second
pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The
company notices significant performance issues with the second ETL pipeline.
The company needs to improve the performance of the second pipeline.
Which solution will meet this requirement MOST cost-effectively?
A data engineer is launching an Amazon EMR cluster. The data that the data engineer needs to load into thenew cluster is currently in an Amazon S3 bucket. The data engineer needs to ensure that data is encryptedboth at rest and in transit.The data that is in the S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key. Thedata engineer has an Amazon S3 path that has a Privacy Enhanced Mail (PEM) file.Which solution will meet these requirements?