Cluster A Outage
Incident Report for OpenSRS
Postmortem

Summary: On May 2, 2024, one of the components within our Hosted Email infrastructure failed. We upgraded the software to fix the problem when the initial recovery effort failed. This situation coincided with a Denial of Service attack, which greatly increased traffic levels, leading to an unresponsive webmail. Mitigation efforts were then taken to block abusing IPs, which allowed the authentication service to recover. Webmail response times returned to expected levels that day, and to ensure the actions we took resolved the issue, we continued to monitor the event until May 6. Once confirmed, we closed this incident.

Client Impact: Webmail and login were slow or unresponsive. Email delivery was delayed due to a short-term IMAP/POP failure.

Remediation: We’ve taken the following actions to prevent a future recurrence:

  • Authorization service reengineering: We’re reducing wait times to expected levels by reengineering the software to scale to higher levels of traffic. We’ll also provide better tools to manage the authentication process. .
  • Proactively arranging maintenance windows: A set of quarterly change windows will be scheduled to address any software updates and preventative maintenance required on the hosted email infrastructure. We'll ensure you're aware of updates/maintenance windows and their impact in advance, and we'll schedule them to occur when our server traffic is typically at its lowest.
  • Increased abuse mitigation efforts: We’re exploring new options to reduce occurrences and the impact of abuse.

Thank you,

Tucows Domains Operations Team

Posted May 10, 2024 - 14:37 UTC

Resolved
We've monitored Cluster A's Webmail responsiveness for the weekend and see it's returned to expected levels.

We are closing this case and thank you for your patience.

Start Time: 05/02/2024 09:00 UTC
End Time: 05/06/2024 12:46 UTC
Total duration: 4 days 3 hours 46 minutes.
Posted May 06, 2024 - 12:48 UTC
Update
We continue to monitor and work towards a full restoration of the Webmail service, which is experiencing interruptions.

We appreciate your patience.

Further updates to come as they are received.
Posted May 03, 2024 - 22:06 UTC
Update
Our team is still monitoring webmail as delays persist, all activity is expected to return to normal soon. Additional updates will be provided as they become available.

We appreciate your patience
Posted May 03, 2024 - 14:31 UTC
Update
Our team continues to monitor webmail due to delays that remain ongoing.

We will provide further updates for you on this minor incident as they become available.
Posted May 02, 2024 - 20:40 UTC
Monitoring
Our team has implemented a fix; IMAP and POP services are working again as expected.

We will continue to monitor the changes and webmail is expected to be back in full service shortly.
Posted May 02, 2024 - 19:17 UTC
Update
We are continuing to work on a fix for this issue. We appreciate your patience.

The next update will be within one hour.
Posted May 02, 2024 - 18:05 UTC
Update
We continue to work to restore full services. Webmail is currently experiencing interruptions due to an ongoing spam campaign.

Inbound email continues to be delivered to mailboxes without issue. Outbound email services may be degraded for some users.

Our next update will be within one hour.
Posted May 02, 2024 - 17:04 UTC
Update
We are continuing to work on this major issue affecting email on Cluster A, including webmail, IMAP, and POP. Your patience is appreciated as we work towards a resolution for restoring full service.
The next update will be within one hour.
Posted May 02, 2024 - 15:50 UTC
Update
At this time users may be unable to access email via IMAP, POP, or webmail.
We continue to work towards resolving the reported issues and will post another update once we have more information to share.
Posted May 02, 2024 - 13:14 UTC
Update
We continue to work towards a resolution. We will have another update once new details are available.
Posted May 02, 2024 - 12:32 UTC
Update
The issue with IMAP has been resolved. Our engineers are continuing to work on restoring webmail slowness and inability to log in.
We will provide another update shortly.
Posted May 02, 2024 - 11:13 UTC
Identified
Our engineers have identified the problem and are working to bring Cluster A back up. We will update again shortly.
Thank you for your patience.
Posted May 02, 2024 - 10:43 UTC
Investigating
We are currently experience a known issue impacting Hosted Email Cluster A
At this time you may not be able to log in via Webmail, IMAP, or POP3. Our engineers are aware of the situation and working to resolve it.
We will post updates as soon as more information is available.
Thank you for your patience.
Posted May 02, 2024 - 09:11 UTC
This incident affected: Hosted Email (Cluster A, Webmail).