š¾SageMaker Inference cost optimisationš¾
Five ways to optimise SageMaker inference cost:
1ļøā£Ā Choose theĀ right instanceĀ for inference. To understand if the instance is under-utilised, check the utilization metrics in Amazon CloudWatch. UseĀ Inference RecommenderĀ to compare different instances and understand the performance of the model and the cost.
2ļøā£Ā UseĀ autoscalingĀ if your traffic is unsteady, instead of provisioning capacity manually. This can lead to low utilization and wasted resources.
3ļøā£Ā Consider usingĀ Amazon Elastic InferenceĀ to attach low-cost GPU-powered acceleration to Sagemaker instances and reduce the cost of Deep Learning inference by up to 75%.
4ļøā£Ā UseĀ Multi-model endpointsĀ (MME) orĀ Multi-container endpointsĀ (MCE) to deploy several models into one endpoint.
5ļøā£Ā Ā UseĀ Savings Plans, they offer up to 72% savings over the on-demand price, in exchange for your commitment to use a specific amount of compute power.
If you like this post, you can share APAWS newsletter with friends: