Cluster A - Inbound Delays
Incident Report for OpenSRS
Postmortem

Incident Date: July 8, 2021
Incident Number: PR-2142

On July 8, 2021 at 11:57 AM ET, Tucows’ hosted email platform experienced service interruption causing inbound and outbound email delays for Prod A. Tucows’ Engineering team was engaged to investigate the issue.

The service interruption was due to execution of a planned maintenance that caused a rspam email processing loop.

At 12:23 PM ET, the Engineering team reverted the change and started monitoring the issue. Since no further errors were observed, the incident was marked resolved. 

At 2:04 PM ET, we noticed further impact to outbound emails in Prod A. At 3:00 PM ET, the engineering team added additional resources on rspam to handle the high load on the system. 

On July 9, 2021 at 1:43 PM ET, The engineering team was able to successfully revert the changes to the legacy system and restore the services. 

Tucows is committed to implementing preventive measures by further deploying new configuration to ensure better email processing, enhancing monitoring and change management processes for better visibility, QA validation and faster recovery.

Thank you,

Tucows Engineering Team

Posted Jul 21, 2021 - 20:20 UTC

Resolved
Our engineering team has correctly identified the cause of the delays to be attributed to a recent update made to the hosted email system. The changes were reverted and after monitoring the result we see message delivery back to normal performance.

Incident Start Time: 07-08-2021 15:57:00
Incident End Time:07-08-2021 16:23:00
Total Duration:26 minutes
Posted Jul 08, 2021 - 17:28 UTC
Identified
Our Engineering team has identified that this issue was caused due to a system change. They have rolled back the change to its previous state and are currently monitoring to confirm recoveries.
Posted Jul 08, 2021 - 16:50 UTC
Investigating
We are currently experiencing inbound mail delays on Cluster A. Our engineering team is investigating the issue.

We will provide an update once we have additional information.
Posted Jul 08, 2021 - 16:04 UTC
This incident affected: Hosted Email (Cluster A).