Intermittent service interruptions
Incident Report for OpenSRS
Postmortem

Incident Date: October 13, 2020
Incident Number: PR-1417

On October 13, 2020 at 10:20 AM ET, We experienced a brief interruption with Domains and HostedEmail platform impacting Hover, Resellers Control panel, IMAP, POP and Webmail in Prod A.

The service interruption was caused due to network interruption with our transit provider. At 10:21 AM ET, The services were restored successfully without any intervention. 

At 10:39 AM ET, Tucows encountered another service interruption impacting Domains platform only. At 10:58 AM ET, The services recovered without any intervention.

Tucows is to work with the vendor to investigate the root cause of the issue.

Tucows is to revise and enhance monitoring for better visibility.

Thank you,

Tucows Engineering Team

Posted Oct 16, 2020 - 15:40 UTC

Resolved
This incident is now resolved and services are back to normal.

Incident Start Time: 10-13-2020 14:20:00 UTC
Incident Start Time:10-13-2020 14:58:00 UTC

Total Duration: ~38 mins
Posted Oct 13, 2020 - 15:55 UTC
Monitoring
We have engaged our provider to investigate the issue. All the services have recovered successfully. To prevent further service interruptions, the Network engineering team has shut down some of our provider's sessions and diverted the traffic to the redundant link.

All services should be back to normal now.

We will continue to monitor the situation to ensure all problems are resolved.
Posted Oct 13, 2020 - 15:17 UTC
Investigating
We are currently experiencing an incident that is impacting Domains, Hostedemail and Webmail (Cluster A only). We are experiencing intermittent service interruptions with our transit provider. We are engaging our provider to investigate the root cause of the issue.

Next Update: Within 30 minutes
Posted Oct 13, 2020 - 14:55 UTC
This incident affected: Hosted Email (Cluster A, Cluster B, Webmail) and Control Panels (Reseller Control Panel).