🐾SageMaker Inference cost optimisation🐾

Jul 23, 2023

Five ways to optimise SageMaker inference cost:

1️⃣ Choose the right instance for inference. To understand if the instance is under-utilised, check the utilization metrics in Amazon CloudWatch. Use Inference Recommender to compare different instances and understand the performance of the model and the cost.

2️⃣ Use autoscaling if your traffic is unsteady, instead of provisioning capacity manually. This can lead to low utilization and wasted resources.

3️⃣ Consider using Amazon Elastic Inference to attach low-cost GPU-powered acceleration to Sagemaker instances and reduce the cost of Deep Learning inference by up to 75%.

4️⃣ Use Multi-model endpoints (MME) or Multi-container endpoints (MCE) to deploy several models into one endpoint.

5️⃣ Use Savings Plans, they offer up to 72% savings over the on-demand price, in exchange for your commitment to use a specific amount of compute power.

If you like this post, you can share APAWS newsletter with friends:

Share APAWS

🐾SageMaker Inference cost optimisation🐾

Discussion about this post