🐾Computing for Data and ML - EMR🐾
Advantages:
🔹 You can launch distributed big data environment in several clicks
🔹 You can choose between transitive and always-on clusters
🔹 You can granularly configure memory, CPU, GPU
🔹 You can tune autoscaling parameters
🔹 You can use not only Spark but also Hive, Hudi, etc.
Limitations:
🔸 Waiting time to launch the cluster is 7-10 minutes
🔸 Durability is not guaranteed for EMR by AWS
Use cases:
🔹 Distributed processing of big data using Spark
🔹 Transitive ETL workloads
If you like this post, you can share APAWS newsletter with friends: