API best practices: Checking for busyness

It is important to check whether API resources are busy before you make requests to them. For example, you can’t reliably make updates to an environment while it’s changing runstates (moving from suspended to running). When an environment is busy, its components (VMs, networks, VPNs, Private Network Connections, ICNR tunnels, etc.) are also busy.

As a best practice, your automation scripts should:

Check the resource status before and after making a request
Include retry logic

Check the resource status before and after making a request

Check the status of the resource before making a PUT or POST request.
- Prior to submitting a PUT or POST request on a VM or environment, use a GET request to check the resource’s runstate field. Don’t perform PUT or POST requests while "runstate": "busy". Retry the GET request (if needed) until you can validate that the VM or environment is in the correct runstate for the operation you want to perform (for example, "runstate": "stopped").
Check the response code and the status of the resource after making a PUT or POST request.
- A 200 status code indicates that the authentication was successful and the request was properly formatted; however, it doesn’t indicate a completed update to the resource. Check for both a 200 status code and an indication that the change was successfully made.
  - For example, the API may return a 200 code after receiving a PUT request to change a VM runstate, even if the VM won’t start running until some time after the request has been made. Continue polling the VM until you can verify that the VM is running.
  - Similarly, the API may return a 200 status code for a PUT or POST request even if the VM is busy and can’t be edited. If you receive a 200 code and the runstate is busy, retry after 10 seconds.
  Some resources (like environments) contain an error field that is populated when there is an problem with the resource (for example, if the VM can’t perform an operation because VMware Tools is still loading or if an import job failed because it contained an unsupported VM type). Generally, these error messages linger until the next successful state change, regardless of whether the issue still persists.
- A 422 status code may indicate that a runstate change was attempted on a busy resource. Poll the resource until you can validate that the VM or environment is in the correct runstate for the operation you want to perform (for example, "runstate": "stopped").
- A 423 status code indicates that the resource can’t be edited due to rate limiting, busyness, or another factor. Generally, a response with a 423 status code includes a Retry-After header. Retry the request after the number of seconds indicated in the header (for example, Retry-After: 30 means retry the request after 30 seconds or more; more time may be needed if additional activity occurs during the wait period).
  
  The Environments resource contains a rate_limited Boolean. If an environment is being rate-limited due to high amounts of activity in the account, rate_limited will be true.
  
  Rate limiting is more likely to occur if you run or suspend a large number of VMs in a short period of time. Kyndryl Cloud Uplift applies rate limiting based on high levels activity across your customer account.
- A 429 status code indicates that the resource can’t be edited due to HTTP rate limiting. Generally, a response with a 429 status code includes a Retry-After header. Retry the request after the number of seconds indicated in the header (for example, Retry-After: 30 means retry the request after 30 seconds or more; more time may be needed if additional activity occurs during the wait period).

Include retry logic

When your script notices busyness or an HTTP status code indicating an error, it should automatically retry the previous command instead of proceeding to subsequent commands. How often you should retry depends on the urgency of the operation, but retrying the command once every 10 seconds is a good standard.
Once the response generates an HTTP status code of 200 and indicates that the resource is no longer busy, your script should cease retrying and proceed to the next step.