API/Control Panel failures
Incident Report for OpenSRS
Postmortem

Incident Date: July 12, 2021
Incident Number: PR-2140

On July 12, 2021 at 6:20 AM ET, Tucows Domain platform experienced service interruption impacting the Resellers Control Panel, API and domain lookup service.  

The service interruption was caused due to an increase in the domain lookup requests overloading system resources. 

At 9:40 AM ET, The engineering team restarted the affected systems and cleared the expired messages queue to resolve the issue.

Tucows is to review and update the system to make it resilient of any future issues. 

Tucows is to enhance triage and troubleshooting documentation to better identify the severity of the issue. 

Thank you,

Tucows Engineering Team

Posted Jul 14, 2021 - 18:37 UTC

Resolved
Our engineering teams have recovered services for both SRS and HRS environments. We will be conducting a full investigation to identify a root cause. This incident is now closed.

Incident Start Time: 07-12-2021 10:20:00 UTC
Incident End Time:07-12-2021 13:40:00
Total Duration: 3 hours and 20 minutes
Posted Jul 12, 2021 - 14:01 UTC
Update
We are continuing to investigate this issue.
Posted Jul 12, 2021 - 13:11 UTC
Investigating
We are currently experiencing issues with connection failures to the API and RCP, our engineering teams have been engaged to investigate. Resellers and HRS customers may experience difficulties with registrations/renewals/transfers.
Posted Jul 12, 2021 - 13:04 UTC
This incident affected: APIs (OpenSRS API, OpenHRS API) and Control Panels (Reseller Control Panel, Classic RWI).