Email Cluster A Status
Degraded
Currently, 50% of customer mailboxes on this cluster are fully available and those customers are able to send and receive email normally. Additionally, forward-only and filter-only accounts are also functioning and mail is being delivered. However, some filter-only users will be unable to log into the Spam quarantine.
The remaining 50% of customer mailboxes are offline and those users have no access to their mailboxes. Those customers are unable to send or receive mail at this time. Inbound mail is being queued for these customers and will be delivered once service is restored. Affected users logging into webmail may see a "Service Unavailable" error message. Users with email clients will not be able to send or receive new mail. They will receive timeout errors.
As anticipated, upon completion of the rebuild of the first affected mailstore, it was determined that we should not restore full service to those users. Restoring service for those users while the other mailstores were being rebuilt would have had too great an impact on the service overall.
Rebuilding of the other affected mailstores is continuing. The current estimate for completion of the rebuild for those remaining mailstores remains 10 P.M. EST.
We will keep you updated. The next update will be provided at approximately 4:30 P.M. EST.
Additionally, we have some more detailed information for you on the nature of the fault that led to this incident:
Q: What happened to cause this degradation?
A: During our scheduled maintenance, the firmware upgrade of our NetApp caused a failure of a controlling disk head in the storage pool.
Q. Why didn't you just rollback the firmware upgrade?
A: During the firmware upgrade, a number of disks became marked as 'bad', triggering a RAID level rebuild by the system. Once this rebuild is triggered, it must complete. Restoring to the previous firmware would do nothing to change this situation. Exactly why the firmware update triggered this rebuild is not known. We are working with NetApp to determine a root cause. We have performed many dozens of firmware upgrades to this type of module in many other filers of the same model in the past, and have never experienced a similar result.
Q: Is this related to the Cluster 'A' service interruptions you experienced last year?
A: We were upgrading the firmware on the NetApp storage devices to address issues related to last year's service interruptions. We tested the firmware upgrade and had previously upgraded firmware like this with no issue. We are working with NetApp to investigate why both disk heads reacted to cause a rebuild. We will provide you with a full incident summary when we have those answers.
This update is related to Incident 4743

