Cluster B login issues
Incident Report for OpenSRS
Postmortem

Incident Date: November 19, 2021
Incident Number: PR-2593

On November 19, 2021 at 7:55 PM ET, Tucows’ hosted email platform experienced service interruption impacting Webmail in Prod B. Tucows’ Network Engineering team was engaged to investigate the issue.

The service interruption was caused due to a DDoS attack. The abusive traffic was mitigated however it caused unexpected behaviour with one of our load balancers.

Tucows’ Engineering team increased the severity of the incident when we observed the external impact. 

At 9:55 PM ET, The engineering team failover the services to the secondary load balancer and restarted webmail nodes in a controlled manner to restore all the services successfully.

Tucows is to continue working with existing vendors to improve DDoS mitigation services to address and rectify DDoS attacks in a timely manner.

Tucows is to work with the vendor to investigate the root cause of the unexpected behavior on the load balancer.

Tucows is to enhance the monitoring for better visibility. 

Thank you,

Tucows Engineering Team

Posted Nov 24, 2021 - 16:34 UTC

Resolved
The engineering team has seen Cluster B as stable and has resolved the issue.

Incident Start Time: 11-20-2021 00:55:00 UTC
Incident End Time:11-20-2021 02:55:00 UTC
Total Duration:2 hours
Posted Nov 20, 2021 - 03:44 UTC
Monitoring
Our engineers continue to restart the webmail nodes and service appears to be functional at this time.

We will continue to monitor the issue and provide further details as they come.
Posted Nov 20, 2021 - 03:15 UTC
Update
Our Engineering team has failed the service to a secondary system and is currently restarting webmail nodes to bring the service back up.

Further updates to come.
Posted Nov 20, 2021 - 02:52 UTC
Update
Our engineering team continues to investigate the root cause of the Cluster B login issues and will provide further updates as they become available.
Posted Nov 20, 2021 - 02:02 UTC
Investigating
We are aware of degraded service affecting Cluster B with customers experiencing intermittent login issues.

Our engineering team is currently investigating
Posted Nov 20, 2021 - 01:27 UTC
This incident affected: Hosted Email (Cluster B).