Connection issues affecting Whois, Domains, and API
Incident Report for OpenSRS
Postmortem

On February 08, 2019 at 12:30 AM ET. Tucows’ Domains platform experienced service interruption. The incident impacted OpenSRS, HRS and Hover domains registrations.

The incident stemmed from a power supply failure. The server chassis rebooted after one of the PSUs (Power Supply Unit) failed. At 2: 35 AM services were restored and confirmed the integrity of the systems.

On February 08, 2019 at 8:50 AM ET. The server chassis rebooted while replacing the defective PSU. At 11:23 AM services were restored.

The following incident exposed hardware defects and architecture gap. Tucows is to investigate with vendor the cause of the reboots. Tucows is to revise systems and services distribution architecture to ensure service redundancy and to prevent future interruptions.

Posted 6 months ago. Feb 13, 2019 - 21:26 UTC

Resolved
We are happy to report the system has been in a steady state with no problems. The issue is now resolved.
Posted 7 months ago. Feb 08, 2019 - 17:23 UTC
Monitoring
The incident has been resolved. The affected services have been in a steady state since the recovery of the APIs/Whois/Domains incident. Our engineering team continues to monitor and validate the integrity of the production systems to prevent future failures.

We will be keeping our status in monitoring for the next while and keeping a close eye on the system.
Posted 7 months ago. Feb 08, 2019 - 17:00 UTC
Identified
Our operations team has identified the problem and are actively working on a solution. When we have additional information we will update our posting.
Posted 7 months ago. Feb 08, 2019 - 15:57 UTC
Investigating
The issue has resurfaced, and our team is once again working on a resolution.
Posted 7 months ago. Feb 08, 2019 - 14:28 UTC
Monitoring
Services have been restored. While there is still some work to be done, there is no service impact at this time. We will continue to monitor the issue until the work is complete.
Posted 7 months ago. Feb 08, 2019 - 08:16 UTC
Update
Our operations team has been able to address the issue and are working to restore full service at this time. Service impact at this point is minimal. An update will be provided once the issue is fully resolved.
Posted 7 months ago. Feb 08, 2019 - 07:27 UTC
Identified
We have identified and are currently working on this issue with API errors preventing domain lookups, registrations, and renewals. We will update again once this is resolved
Posted 7 months ago. Feb 08, 2019 - 06:39 UTC
This incident affected: APIs (OpenSRS API, OpenHRS API, Email API) and Domain Services (Core gTLDs, Core ccTLDs, Other gTLDs, Other ccTLDs, WHOIS).