🐾Amazon EMR: Team collaboration and resource sharing🐾
🤓 Have you ever had a problem where someone on the team takes all the resources in the EMR cluster meant for the entire team? How to share resources across the team in a way that can satisfy everyone?
Solution
The most obvious solution is to split resources equally between the engineers. In EMR provisioned clusters this problem can be solved by configuring YARN queues for each engineer. Another approach is to create clusters per engineer, which is not cost-efficient in the case of EMR-provisioned clusters. But it’s a good solution if you will use EMR Serverless.
You can create a Service Catalog product for the EMR Serverless deployment option. To create a product in the Service Catalog you need to create a CloudFormation template or Terraform code. Administrators can specify properties and define minimum and maximum values, allowed values, or even validate them with regular expressions.
Benefits:
Engineers won’t compete for resources in the EMR cluster because everyone has dedicated EMR Serverless application.
It’s easy to onboard new team members as they can create EMR application using Service Catalog product.
You as administrator can define limits for the configuration, so engineers cannot create too large clusters.
In EMR Serverless you pay only for the time resources (vCPU, memory and storage) were used.
🦊 EMR Serverless is a great option for running periodic workloads, such as data analysts running queries or data scientists experimenting with ML models.
Thank you for reading, let’s chat 💬
💬 Have you ever used EMR Serverless applications?
💬 Have you encountered any issues or limitations of such solution?
💬 Which tips and tricks do you use for your Big Data workloads?
I love hearing from readers 🫶🏻 Please feel free to drop comments, questions, and opinions below👇🏻