Webmail & IMAP Login Failures - Cluster B
Incident Report for OpenSRS
Postmortem

Incident Date: December 03, 209
Incident Number: PR-644

On December 3, 2019, at 11:15 am ET, Tucows HostedEmail platform experienced service degradation, email delivery delays, impacting inbound traffic in prod B.

The service degradation was caused by a high load on a network storage device due to a disk failure.

At 14:34 ET, The Operations team restored all the services and stabilize the load in the email environment. At 20:17 ET Operations team successfully completed the disk rebuild.

Preventive measures: As part of the ongoing stabilization efforts in PROD B; Tucows will prioritize the migration of mail stores on the affected hardware onto high-performance storage to prevent further client impact.

Thank you,

Tucows Operations

Posted Dec 06, 2019 - 18:12 UTC

Resolved
All Cluster B email issues are now resolved and users should no longer receive any errors when logging into their webmail.

A storage device failure resulted in higher loads across Cluster B which in turn, resulted in IMAP/Webmail/POP services being degraded. Temporarily disabling IMAP nodes to lighten the load while the necessary repairs were made helped bring the cluster back online.
Posted Dec 03, 2019 - 20:10 UTC
Monitoring
IMAP, POP and Webmail services are now back up for all customers. Users will start to see emails arrive in their inboxes as the system is brought back up as well.

Next Update at 7:30 PM UTC
Posted Dec 03, 2019 - 19:04 UTC
Update
Our engineering team continue to work on at mitigating the impact of this issue. Some users may be able to access their webmail if they are able to access via IMAP as well. Otherwise the issue will still be present.

Next Update at 7:00 p.m. UTC
Posted Dec 03, 2019 - 18:28 UTC
Update
We continue to investigate login failures on cluster B. Users will be unable to login via Webmail and may experience login or connection degradation via IMAP.
Posted Dec 03, 2019 - 17:40 UTC
Update
We are currently investigating reports of login failures via webmail for cluster B. Users will be unable to log into their accounts at this time.

Users will be able to access email via IMAP/SMTP through Email clients during this outage.
Posted Dec 03, 2019 - 16:50 UTC
Investigating
We are currently investigating reports of login failures via webmail for cluster B. Users will be unable to log into their accounts at this time.
Posted Dec 03, 2019 - 16:40 UTC
This incident affected: Hosted Email (Cluster B, Webmail).