Our operations team has identified and fixed the root cause of an intermittent API performance issue that has affected some OpenSRS resellers. This occurred on October 1, 2017, between 6:45pm and 9:45pm EDT, and again on October 2, between 9:20am and 10:20am EDT.
Description of the issue:
One of the country-code registries we are connected to has a limitation on the number of checks available to us. Once our system reached this threshold, we no longer received responses from that registry, and open connections from additional requests remained open and accumulated. Our system is designed to close these connections, however, the rate of incoming connections outpaced the rate at which unresponsive connections were cancelled. Under normal circumstances and load, this is not reseller impacting. However, in combination with an extremely high and atypical volume of domain activity, the issue was amplified and created a cascading effect that led to the intermittent unavailability of the API and reseller control panel.
API and control panel performance returned to normal at around 10:30am EDT on October 2, 2017.
Description of the fix:
We have implemented a system-wide check that will monitor all unresponsive connections used for domain lookups and ensure that all unresponsive connections are proactively canceled after a short period of time. Associated API requests will also get an appropriate response from our systems, preventing the accumulation of API connections in an unresponsive state. This code change was promoted to our production environment at 5pm EDT on October 2, 2017.
We are considering this issue resolved and we apologize for any inconvenience that this issue may have caused.
This API performance issue is in no way related to the Network Connectivity Issue that we had experienced on September 29, 2017. While both have affected the availability of the API, the root cause of those two incidents is entirely different. We apologize for the unfortunate timing of these issues.
As always, please contact OpenSRS support for help or additional information.