🐾3 things you definitely want to know about S3 lifecycle configuration🐾
🤓 The answer to the question “How not to create a data swamp in your S3 buckets?” is pretty easy. Obviously, you should structure your buckets properly and use lifecycle policy, but do you know all the capabilities that S3 lifecycle policy can offer?
1️⃣ Scenario 1: You want to store backup files on S3 and don’t want to store old backups, so you use the S3 lifecycle policy. At the same time, you are afraid that if the backup process fails — all backups will be deleted including the last one. You seek for option to delete only old backup files and in case only one file is left — you want to keep it.
You can resolve this problem by enabling versioning on the S3 bucket and storing backups as different versions of one file. After that, you should specify NoncurrentVersionExpiration parameter to define when non-current object versions expire. In this case, the current version of the file is safe from being deleted by lifecycle policy.
2️⃣ Scenario 2: To save cost, you want to archive files to Glacier two months after file creation. You have 1000 log files with an average size of 150 KB stored in the bucket.
The transition of small objects to Glacier or Glacier Deep Archive will increase S3 cost because of per-object transition cost, so in the given scenario, you will pay:
1000 files * $0.02 per transition request = $20
In this case, it is better to aggregate your small files into bigger ones. If you are looking for a file aggregation solution — you can check s3-small-objects-compaction solution developed by Josh Hart.
3️⃣ Scenario 3: You need to upload large files to the S3 bucket and use multipart upload to accelerate the process. Sometimes, multipart upload request isn’t finished successfully, and the parts are remaining in the bucket incurring storage cost.
In order to delete such incomplete uploads automatically, you should configure the lifecycle policy with AbortIncompleteMultipartUpload parameter added. You can specify the number of days after the request initiation, and any parts for which the request was not completed within that timeframe will be removed.
Thank you for reading, let’s chat 💬
💬 Do you use S3 lifecycle configuration in your workloads?
💬 Do you have any use cases that cannot be covered by existing functionality?
💬 Do you know any other tips for using lifecycle configurations?
I love hearing from readers 🫶🏻 Please feel free to drop comments, questions, and opinions below👇🏻