Cluster B - Inbound Delivery
Incident Report for OpenSRS
Postmortem

Incident Date: November 30, 2020 Incident Number: PR-1548

On November 30, 2020 at 11:16 AM ET, Tucows hosted email platform experienced inbound email deferrals impacting Prod B.

The issue was caused by high inbound mail causing socket timeouts.

At 1:48 PM ET, Tucows engineering team executed an emergency maintenance to update the configuration on the underlying infrastructure to improve stability.

At 4:40 PM ET,  the Hosted Email engineering rolled back spam filter change that addressed the mail deferral issues.

Tucows engineering is to improve logging and enhance monitoring visibility to identify issues in a timely manner.

Thank you,

Tucows Engineering Team

Posted Dec 02, 2020 - 17:32 UTC

Resolved
The Engineering team has completed the emergency maintenance. Users should not experience any inbound delays anymore.

Incident Start Time: 11-30-2020 16:16:00 UTC
Incident End Time:11-30-2020 21:40:00 UTC
Total Duration:5 hours, 24 minutes
Posted Nov 30, 2020 - 22:12 UTC
Update
In addition to some of the improvements made our engineering team is working to roll back some previously deployed changes to improve stability.

Rough estimate one more hour until that completes. We will continue to post updates as we receive them.
Posted Nov 30, 2020 - 21:11 UTC
Update
Our engineering team has made enhancements to the affected infrastructure services that were causing delays. We should expect to see improvements within the hour. We are monitoring the changes for the time being. We'll release an update when we have the all clear.
Posted Nov 30, 2020 - 20:12 UTC
Identified
We are currently performing an emergency maintenance that should address the issue. Users should not notice any additional outage of service during the maintenance.
Posted Nov 30, 2020 - 19:20 UTC
Update
Our team is actively implementing a means to reduce the frequency of bounces and deferred mail. Users should soon notice an improvement.
Posted Nov 30, 2020 - 17:59 UTC
Investigating
A subset of users may be experiencing issues receiving mail on Cluster B. Senders may receive the following bounceback:

451 4.7.1 Service unavailable - try again later

Alternatively, mail may be accepted but delayed by up to 45~ minutes

Our team is currently investigating
Posted Nov 30, 2020 - 16:58 UTC
This incident affected: Hosted Email (Cluster B).