As Sunil recently pointed out, one needs to be careful with their usage of the cloud. One of the easy traps to fall into is related to the Simple Storage Service (S3), where you can basically write files forever… and pay the bill later.
Since we daily store a good amount of data, it’s very important for us to manage it. To do so, my first reflex was creating an internal tool called “s3Cleanup”, which took a JSON configuration with prefixes as key and expirations as value. It ran periodically, it was slow and not free since we need to pay for all the required LIST calls.
Which reminds us a quick lesson: know your tools. S3 already provides a way to delete old files. It’s called Object Lifecycle Management. All you have to do is configure it per bucket. Amazon will then delete the files as per your configuration for you, free of charge.
AWS Console, S3 Properties -> Lifecycle
It may work if you have a few rules. If you have many though, repeatedly clicking add, filling the form and saving quickly becomes a waste of time.
So my next iteration was wiser. I created a simple python command line tool called s3-lifecycle-editor. It allows you to do three things: dump a bucket configuration to the command line, edit a configuration and, more interestingly, replace a configuration from an XML file.
Replacing a configuration from a file is great because it allows you to version it and easily restore it in case of an issue. If you have a certain bucket structure that you replicate through each of your clients, it’s easy to reapply the same rules to all those buckets.
s3-lifecycle-editor is open sourced under the MIT license and available on github.
Inline editing with s3-lifecycle-editor and vim