Post-mortem: Why several users got spammed with emails

Recently, we switched our Celery broker to Redis. This was a great improvement in the way we handle tasks, but there was a caveat which we did not forecast.

This caveat of the Redis broker led to the follow up emails we send to our users 4 days after their subscription being sent multiple times.

We have tracked down the issue and changed the way we send emails, but this does not minimize the number of emails several users received.
We are deeply sorry about this incident and we have taken measures so that it does not happen again.

The issue

When using the Redis Celery broker, tasks that do not get executed some time after they are received, they are rescheduled - reference.

Copying from the Redis documentation:

If a task is not acknowledged within the Visibility Timeout the task will be redelivered to another worker and executed.

Since tasks were being scheduled 96 hours after a user was created, there was a theoretical limit of 95 emails being sent to a user, since the email was being rescheduled every one hour.

The good fact is that we deploy new code several times a week and the duplicate tasks were being rescheduled only once after each deployment, crucially reducing the actual amount of emails that were delivered to a user. The most emails sent to one user were 87, which was during the weekend.

How we solved the issue

When we became aware of the issue, we immediately canceled pending emails and started working on an alternative way to tackle the issue.

We have created a daily cron task that collects all users that were created 4 days ago and dispatches the emails, all done using Celery beat.
This improves performance and makes sure duplicate emails will not be sent again.

Thoughts for future improvements

We have solved the case for now, but spamming is one of our worst nightmares and definitely something we don't want to come from our service. Thus, we have started working on some thoughts on how to effectively rate limit emails to users, so that incidents like this one never happen again.
The first thought is limiting the amount of emails we can send to a user during a specific period of time and add alerts to our administration team when this limit is reached.

We are constantly doing our best to create an awesome service for you and we give our word that we'll continue to do so.

Be creative and enjoy coding in your browser.