🐾EMR Serverless: When it makes sense and when it doesn't 🐾
🤓 While suggesting someone to use Serverless, the first thing I most frequently hear in response is: “Oh, it is sooo expensive, I think we can manage it by ourselves”. Especially, when we are talking about EMR Serverless. But believe me, if you use it in the right use cases and configure appropriately, it can be cheaper than EMR on EC2 or EKS.
How and Why it can be cheaper
EMR Serverless allows more fine-grained scaling, so you're not paying for idle resources during the job execution. It is quite hard to achieve high utilisation with EMR on EC2 or EMR on EKS. I even see people developing their own custom scaling solutions to avoid low utilisation periods and additional cost.
Serverless allows you to configure resources, such as memory, CPU and storage more granularly as well. With EC2 instances, it is more difficult to match the exact resources you need.
EMR Serverless has a quick startup time, while with EMR on EC2 you will be charged additional for the time while instances are idle during setup period.
Use cases
EMR Serverless is a good choice for the following types of workloads:
Unpredictable time and duration: you're running analytics jobs a few times a day or week, they can be 10 minutes long, or 1 hour long.
Short periodical runs: a retail company running end-of-day sales analysis that takes 30 minutes each evening.
You struggle with managing scaling: some jobs need 10 workers and others need 100 and you struggle with coordinating node scaling based on job start/finish.
You don’t want operational overhead: straightforward ETL jobs, especially when integrated with services like AWS Glue Data Catalog or Step Functions
When EMR Serverless is definitely should not be your choice:
Consistent workloads: you have many routine jobs which resource consumption is stable and predictable.
Cluster customization: you need a custom AMI or specific Hadoop ecosystem components.
Thank you for reading, let’s chat 💬
💬 Have you tried EMR Serverless?
💬 Have you noticed any issues or missing features?
💬 What is your biggest challenge when using EMR?
I love hearing from readers 🫶🏻 Please feel free to drop comments, questions, and opinions below👇🏻