Cluster A - email delays

Incident Report for OpenSRS

Postmortem

Incident Date: July 26, 2020
Incident Number: PR-1215

On July 26, 2020, at 10:50 AM ET, the Tucows Hosted Email platform experienced service degradation and email delivery delays impacting cluster A.

The service interruption was caused by a high load on a network storage device.

At 1:35 PM ET, The Engineering team successfully stopped all the processes that were causing high load and brought the services back online.

Tucows is in the process of increasing resources to further spread the load and eliminate future interruptions.

Thank you,

Tucows Engineering Team

Posted Jul 28, 2020 - 15:31 UTC

Resolved

Our engineering team has resolved the issue on cluster A that was impacting load times and email delay.

Incident Start Time: 07-26-2020 14:50:00 UTC
Incident End Time:07-26-2020 17:35:00 UTC
Total Duration:2 hours 45 mins
Posted Jul 26, 2020 - 18:21 UTC

Update

Engineering team is still working on resolving the identified issues to alleviate the load. We will provide further updates shortly.

Customers may experience longer load time and email delays.
Posted Jul 26, 2020 - 17:21 UTC

Identified

Engineering team has identified issues that were causing high load on one of our storage devices. They are currently working on resolving them to alleviate the load.

Customers may experience longer load time and email delays.
Posted Jul 26, 2020 - 16:21 UTC

Investigating

We are experiencing high load issue in one of our storage devices which is causing email delays in Cluster A. We have engaged the engineering team and they are currently investigating the issue.
Posted Jul 26, 2020 - 15:24 UTC
This incident affected: Hosted Email (Cluster A, Webmail).