Cluster A - Sending issue
Incident Report for OpenSRS
Postmortem

Incident Date: July 8, 2021
Incident Number: PR-2142

On July 8, 2021 at 11:57 AM ET, Tucows’ hosted email platform experienced service interruption causing inbound and outbound email delays for Prod A. Tucows’ Engineering team was engaged to investigate the issue.

The service interruption was due to execution of a planned maintenance that caused a rspam email processing loop.

At 12:23 PM ET, the Engineering team reverted the change and started monitoring the issue. Since no further errors were observed, the incident was marked resolved. 

At 2:04 PM ET, we noticed further impact to outbound emails in Prod A. At 3:00 PM ET, the engineering team added additional resources on rspam to handle the high load on the system. 

On July 9, 2021 at 1:43 PM ET, The engineering team was able to successfully revert the changes to the legacy system and restore the services. 

Tucows is committed to implementing preventive measures by further deploying new configuration to ensure better email processing, enhancing monitoring and change management processes for better visibility, QA validation and faster recovery.

Thank you,

Tucows Engineering Team

Posted Jul 21, 2021 - 20:19 UTC

Resolved
Our hosted email engineers identified the issues causing outbound delays and rejections and have remedied the situation. This incident is now resolved.

Start time: 18:04 UTC
End Time: 19:00 UTC
Total Duration: 56 Minutes
Posted Jul 08, 2021 - 19:38 UTC
Investigating
Our email team is currently investigating an issue impacting cluster A users. Users may experience issues sending emails. We will provide more updates as they become available.
Posted Jul 08, 2021 - 18:27 UTC
This incident affected: Hosted Email (Cluster A).