🐾Computing for Data and ML - Glue🐾
Advantages:
🔹 You don’t need to manage underlying resources
🔹 You can create ETL pipelines without coding
🔹 Glue Parquet writer allows faster Parquet file writes
Limitations:
🔸 You can use only Scala and Python to write scripts
🔸 Supported data formats: CSV, Parquet, XML, JSON, Avro, ORC, Ion, grokLog
🔸 You cannot install additional Python libraries only those predefined by AWS
Use cases:
🔹 Repeated workloads, such as daily ETLs
🔹 Workloads that benefit from using Spark
🔹 Data Catalog and data schema changes detection
If you like this post, you can share APAWS newsletter with friends: