Handling errors

    Errors can be divided into three categories:

    • Temporary failures (retryable, hereinafter — R): Include a short-term loss of network connectivity, temporary unavailability, or overload of a YDB subsystem, or a failure of YDB to respond to a query within the set timeout. If one of these errors occurs, a retry of the failed query is likely to be successful after a certain period of time.

    • Errors that can presumably be fixed with a retry after the client app response (conditionally retryable, hereinafter — C): Include no response within the set timeout or an authentication request.

    You should retry an operation only if an error refers to a temporary failure. Don’t attempt to retry invalid operations, such as inserting a row with an existing primary key value into a table or inserting data that mismatches the table schema.

    It’s extremely important to optimize the number of retries and the interval between them. An excessive number of retries and too short an interval between them cause excessive load. An insufficient number of retries prevents the operation from completion.

    When selecting an interval, the following strategies are usually used:

    • Exponential backoff. For each subsequent attempt, the interval increases exponentially.
    • Intervals in increments. For each subsequent attempt, the interval increases in certain increments.
    • Constant intervals. Retries are made at the same intervals.
    • Instant retry. Retries are made immediately.

    When you select an interval and the number of retries, consider the YDB termination statuses.

    Don’t repeat instant retries more than once.

    When using the SDK, we recommend logging all errors and exceptions:

    • Log the number of retries made. An increase in the number of regular retries often indicates that there are issues.
    • Log all errors, including their types, termination codes, and their causes.
    • Log the total operation execution time, including operations that terminate after retries.

    Termination statuses

    Below are termination statuses that can be returned when working with the SDK.

    Error types:

    • R (retryable): Temporary failures
    • N (non retryable): Errors that can’t be fixed by a retry