🐾DynamoDB: capture new records efficiently🐾
🤓 Let’s imagine that you have files uploaded to S3. You want to write some information about them to DynamoDB on file upload and after that start file processing. How would you design such a workload?
Option 1: Use Lambda function to write information to DynamoDB and after that just write event that would trigger further processing to SQS queue.
What could possibly go wrong in this scenario? Actually, there are several things that could happen:
❌ Lambda function could fail before deleting event from source queue and process the same event twice. In this case, you should make your processing idempotent, so duplication wouldn’t affect correctness of your business logic.
❌ Lambda function could fail before writing event to the target queue. In this case, there will be latency between writing information to DynamoDB and writing it to the target queue, because of SQS visibility timeout.
✅ On the other hand, benefit of using Lambda function is that you can add some business logic. In case you want not only to pass the information added to DynamoDB, but add any other fields — using Lambda function is the right solution.
Option 2: Use Lambda function to write to DynamoDB and DynamoDB Streams to capture database appends and send them for further processing.
DynamoDB has built-in mechanism for change data capture (CDC) called DynamoDB Streams.
✅ While you are using DynamoDB Streams, you can be sure that each stream record appears exactly once in the stream.
❌ The downside of using Streams is that you cannot add any additional information to the record and should develop your business logic accordingly.
Thank you for reading, let’s chat 💬
💬 Do you have any feedback on using DynamoDB Streams?
💬 Do you want to share any tips you use while designing data workloads?
💬 Which topic on data processing workloads do you want me to cover?
I love hearing from readers 🫶🏻 Please feel free to drop comments, questions, and opinions below👇🏻