Zaui I/O
Search
K

Implementing Exponential Back-Off

Exponential back-off is the process of a client periodically retrying a failed request over an increasing amount of time. It is a standard error handling strategy for network applications. Besides being “required”, using exponential back-off increases the efficiency of bandwidth usage, reduces the number of requests required to get a successful response and maximizes the throughput of requests in concurrent environments.
The flow of implementing a simple exponential back-off is as follows:
  1. 1.
    Make a request to the API
  2. 2.
    Receive the response, check for error that has a retry-able error code (such as 503)
  3. 3.
    Wait 1s + random_number_milliseconds seconds
  4. 4.
    Retry request
  5. 5.
    Receive the response, check for error that has a retry-able error code (such as 503)
  6. 6.
    Wait 2s + random_number_milliseconds seconds
  7. 7.
    Retry request
  8. 8.
    Receive the response, check for error that has a retry-able error code (such as 503)
  9. 9.
    Wait 4s + random_number_milliseconds seconds
  10. 10.
    Retry request
  11. 11.
    Receive the response, check for error that has a retry-able error code (such as 503)
  12. 12.
    Wait 8s + random_number_milliseconds seconds
  13. 13.
    Retry request
  14. 14.
    Receive the response, check for error that has a retry-able error code (such as 503)
  15. 15.
    Wait 16s + random_number_milliseconds seconds
  16. 16.
    Retry request
  17. 17.
    If you still get an error, stop and log the error
Note: random_number_milliseconds MUST be redefined after each “Wait”
In the above flow, random_number_milliseconds is a random number of milliseconds less than or equal to 1000. This is necessary to avoid certain lock errors in some concurrent implementations.
Note: the wait is always (2^n) + random_number_milliseconds, where n is a monotonically increasing integer initially defined as 0. N is incremented by 1 for each iteration (each request)
The algorithm is set to terminate when n == 5. This ceiling is in place to prevent clients from retrying infinitely, and results in a total delay of 32 seconds before a deemed “unrecoverable error.”