For example, one file sometimes cannot be read, but later it can. In this scenario, it’s likely one of the disks went bad, resulting in one of the replicas returning incorrect data to the user.

    To recover the volume, we can identify the corrupted replica and remove it from the volume:

    1. Log in to each node that contains a replica of the volume and get to the directory that contains the replica data.

      For example, the replica might be stored at:

    2. Run a checksum for every file under that directory.

    3. Compare the output of each replica. One of them should fail or have different results compared to the others. This will be the one replica we need to remove from the volume.

    4. Use the Longhorn UI to remove the identified replica from the volume.

    5. Scale up the workload to make sure the error is gone.