Email Cluster A Status Archive
Email Cluster A is Online
We have just a bit more information on the email issue we reported earlier. It turns out the only issue was with our internal email here at Tucows (Tucows personnel sending and receiving).
We sounded the alarm in case our own experience represented a broader problem with the service. In fact, the Email service was not actually degraded.
In a related story, "Wolf!!!".
Email Cluster A is Online
The earlier issues identified with mailflow have now been resolved by our Operations and Development teams. Upon review it was determined there was no impact to customer email service.
Email Cluster A is Degraded
We are currently experiencing an intermittent issue affecting our Cluster A Email Service. Sending and receiving of email may be disrupted in some cases.
Our Operations team is working to resolve this issue.
Email Cluster A is Online
The maintenance on OpenSRS Email Cluster 'A' finished ahead of schedule and all services have been restored.
Email Cluster A is In Maintenance
Scheduled maintenance on OpenSRS Email Service Cluster 'A' will begin very shortly.
During this four-hour maintenance period, OpenSRS Email Service Cluster 'A' will be unavailable, including POP, IMAP, Webmail, SMTP services and provisioning.
PLEASE NOTE: End users of Email Service Cluster 'A' will not have access to their mailboxes during the maintenance period. All inbound messages will be queued for delivery after the maintenance is complete.
Email Cluster A is Online
The previous notice was sent in error - the planned Cluster 'A' maintenance is scheduled to begin at 01:00 EST. Cluster 'A' is fully available at this time.
Email Cluster A is Online
Our earlier mitigation efforts appear to have helped and we have not experienced any recurrence related to this weekend's intermittent outage.
It's been nearly 24 hours since our last event and we're continuing to work with our storage vendor to bring this issue to full resolution.
In the meantime, we feel confident that our emergency changes have helped to temporarily bypass the problem until a permanent fix can be implemented. So we're going to change the status to Online and continue to closely monitor the platform.
Once again, we appreciate your patience and apologize for the trouble this event may have caused you and your end users.
This update is related to Incident 20414
Email Cluster A is Degraded
We've been closely monitoring the intermittent performance issues affecting Mail Cluster A. After exhaustive testing, we believe we can rule out the load balancer as the cause of the behaviour and have focused our efforts on the storage service for the Mail Cluster.
Although we haven't seen the symptoms since 14:22 EST yesterday afternoon, we know that load plays a factor in these events. To help mitigate the effects that load could bring, and reduce the chance of the event recurring, we have worked throughout the night to make preparations to ensure write latency is kept to a minimum and disk writes are able to run as efficiently and quickly as possible during peak load.
Our current focus will be to continue to work with our vendor's Kernel/filesystem experts in identifying and resolving the root cause affecting the storage service.
We sincerely apologize for the inconvenience this issue has caused you and your customers.
This update is related to Incident 20414
Email Cluster A is Degraded
We continue to monitor Cluster A closely.
Users may experience short periods (less than 10 minutes) where access via POP, IMAP and Webmail is unavailable. We've taken steps to minimize the impact of these periods and to reduce their frequency.
At the same time we are also working on determining the root cause of the issue.
Once again, we're very sorry about the inconvenience to you and your customers.
This update is related to Incident 20414
Email Cluster A is Degraded
We've made some progress toward resolving the intermittent POP3/IMAP/Webmail issues on Cluster A.
While ruling out the load balancer update as root cause, we identified some unusual behaviour related to the storage cluster. Our logging has indicated high latency network filesystem writes on the storage cluster that appear to coincide with the intermittent outage events.
As we continue to work toward ruling out the load balancer update as root cause, we're also working in parallel to further investigate and rule out the storage cluster as a contributing factor to the intermittent connectivity issue.
Impact to end user mailboxes is improving, with the 5-10 minute intermittent connection / slowness issues occurring less often. Earlier today, the interval was every 45 minutes, but as of this update, the impact interval is closer to 1H 45M.

