🐾EC2 Auto Scaling: most common mistakes I've seen 🐾
🤓 EC2 Auto Scaling is a widely used feature in many architectures. It may seem like a simple component that can be easily configured. However, through my experience working with different clients, I’ve noticed that even with seemingly straightforward services, mistakes can still happen. Today, I’d like to share some of the most common mistakes I’ve encountered.
1️⃣ Choose scale-in threshold wisely
Companies often focus their autoscaling planning on scale-out policies to ensure smooth user experience. However, sometimes they fail to give equal attention to scale-in policies.
Use case: imagine your application typically uses 50% of its CPU capacity with three instances running. You set your scale-out policy to trigger at 70% utilization and your scale-in policy at 30%. During a peak load, the Auto Scaling group launches two additional instances. Once the load decreases, CPU utilization drops to 35-40% across five instances. However, the same load could be efficiently handled by three instances at 60-65% utilization. This means you’re paying for two underutilized instances.
Monitor your application and carefully choose scale-in thresholds so you can not only expand your fleet quickly to meet demand but also shrink it effectively when demand decreases.
2️⃣ Know your application warm-up time
Some applications have longer warm-up time than others and you definitely want to know whether your application is one of them.
Use case: suppose your application is complex and needs 5-7 minutes to warm up and start processing requests. When the load increases, the high target value triggers a scaling action. The Auto Scaling group launches an additional instance. After it starts, the application begins warming up. During this time, the target value remains high because the new instance hasn’t started processing requests. Five minutes later (the default cooldown period), the Auto Scaling group launches another instance, while the previous one is still warming up.
To prevent this kind of over-provisioning, set the cooldown period to slightly exceed your application's warm-up time. This gives new instances sufficient time to initialize and begin handling traffic. You can configure this using the DefaultCooldown parameter at the Auto Scaling group level. Also, you can specify cooldown periods for individual scaling policies.
⚠️ Remember that cooldown is different from the instance warmup time setting. Warmup time determines how long before a new instance's metrics are counted in the Auto Scaling group's aggregated metrics. Be sure to configure both appropriately.
3️⃣ Set reasonable maximum limit
When configuring an Auto Scaling group, companies often set maximum capacity limits far above their current needs, because “What actually can go wrong with it?”. Usually, the reasoning behind this is the following:
ASG tracks target metric, so it won’t scale instances above the desired quantity.
you don’t need to change this value continuously with your application growing.
Use case: suppose your application typically scales between 5-10 instances. However, you set the maximum limit to 50 instances. Since you’ve never reached that capacity, you don’t set any alerts for instance counts. Now, imagine a DDoS attack floods your application with malicious requests. The Auto Scaling group begins scaling to handle the traffic and eventually launches 50 instances. It significantly increases your costs at the end.
Consider setting the maximum capacity closer to your actual needs (e.g., 2-3 times your typical peak load). Additionally, configure CloudWatch alarms to alert you when scaling approaches your defined thresholds. This way, you can monitor growth and adjust your fleet size proactively.
Thank you for reading, let’s chat 💬
💬 Any other mistakes you spotted while working with ASG?
💬 Any tips you can share on how to choose configuration for ASG?
💬 Any other topics you would like me to cover?
I love hearing from readers 🫶🏻 Please feel free to drop comments, questions, and opinions below👇🏻