2.4. CouchDB Replication Protocol

    The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

    2.4.1.2. Goals

    The primary goal of this specification is to describe the CouchDB Replication Protocol under the hood.

    The secondary goal is to provide enough detailed information about the protocol to make it easy to build tools on any language and platform that can synchronize data with CouchDB.

    2.4.1.3. Definitions

    JSON:

    JSON is a text format for the serialization of structured data. It is described in and RFC 4627.

    URI:

    A URI is defined by . It can be a URL as defined in RFC 1738.

    ID:

    An identifier (could be a UUID) as described in .

    Revision:

    A MVCC token value of following pattern: where N is ALWAYS a positive integer and sig is the Document signature (custom). Don’t mix it up with the revision in version control systems!

    Leaf Revision:

    The last Document Revision in a series of changes. Documents may have multiple Leaf Revisions (aka Conflict Revisions) due to concurrent updates.

    Document:

    A document is a JSON object with an ID and Revision defined in _id and _rev fields respectively. A Document’s ID MUST be unique within the Database where it is stored.

    Database:

    A collection of Documents with a unique URI.

    Changes Feed:

    A stream of Document-changing events (create, update, delete) for the specified Database.

    Sequence ID:

    An ID provided by the Changes Feed. It MUST be incremental, but MAY NOT always be an integer.

    Source:

    Database from where the Documents are replicated.

    Target:

    Database where the Documents are replicated to.

    Replication:

    The one-way directed synchronization process of Source and Target endpoints.

    Checkpoint:

    Intermediate Recorded Sequence ID used for Replication recovery.

    Replicator:

    A service or an application which initiates and runs Replication.

    Filter Function:

    A special function of any programming language that is used to filter Documents during Replication (see )

    Filter Function Name:

    An ID of a Filter Function that may be used as a symbolic reference (aka callback function) to apply the related Filter Function to Replication.

    Filtered Replication:

    Replication of Documents from Source to Target using a Filter Function.

    Full Replication:

    Replication of all Documents from Source to Target.

    Push Replication:

    Replication process where Source is a local endpoint and Target is remote.

    Pull Replication:

    Replication process where Source is a remote endpoint and Target is local.

    Continuous Replication:

    Replication that “never stops”: after processing all events from the Changes Feed, the Replicator doesn’t close the connection, but awaits new change events from the Source. The connection is kept alive by periodic heartbeats.

    Replication Log:

    A special Document that holds Replication history (recorded Checkpoints and a few more statistics) between Source and Target.

    A unique value that unambiguously identifies the Replication Log.

    2.4.2. Replication Protocol Algorithm

    The CouchDB Replication Protocol is not magical, but an agreement on usage of the public to enable Documents to be replicated from Source to Target.

    The reference implementation, written in Erlang, is provided by the module in Apache CouchDB.

    It is RECOMMENDED that one follow this algorithm specification, use the same HTTP endpoints, and run requests with the same parameters to provide a completely compatible implementation. Custom Replicator implementations MAY use different HTTP API endpoints and request parameters depending on their local specifics and they MAY implement only part of the Replication Protocol to run only Push or Pull Replication. However, while such solutions could also run the Replication process, they loose compatibility with the CouchDB Replicator.

    2.4.2.1. Verify Peers

    The Replicator MUST ensure that both Source and Target exist by using requests.

    2.4.2.1.1. Check Source Existence

    2.4.2.1.2. Check Target Existence

    Request:

    1. HEAD /target HTTP/1.1
    2. Host: localhost:5984
    3. User-Agent: CouchDB

    Response:

    1. HTTP/1.1 200 OK
    2. Cache-Control: must-revalidate
    3. Content-Type: application/json
    4. Date: Sat, 05 Oct 2013 08:51:11 GMT
    5. Server: CouchDB (Erlang/OTP)

    2.4.2.1.3. Create Target?

    In case of a non-existent Target, the Replicator MAY make a request to create the Target:

    Request:

    1. PUT /target HTTP/1.1
    2. Accept: application/json
    3. Host: localhost:5984
    4. User-Agent: CouchDB

    Response:

    1. HTTP/1.1 201 Created
    2. Content-Length: 12
    3. Content-Type: application/json
    4. Date: Sat, 05 Oct 2013 08:58:41 GMT
    5. Server: CouchDB (Erlang/OTP)
    6. {
    7. "ok": true
    8. }

    However, the Replicator’s PUT request MAY NOT succeeded due to insufficient privileges (which are granted by the provided credential) and so receive a 401 Unauthorized or a error. Such errors SHOULD be expected and well handled:

    1. HTTP/1.1 500 Internal Server Error
    2. Cache-Control: must-revalidate
    3. Content-Length: 108
    4. Content-Type: application/json
    5. Date: Fri, 09 May 2014 13:50:32 GMT
    6. Server: CouchDB (Erlang OTP)
    7. {
    8. "error": "unauthorized",
    9. "reason": "unauthorized to access or create database http://localhost:5984/target"
    10. }

    2.4.2.1.4. Abort

    In case of a non-existent Source or Target, Replication SHOULD be aborted with an HTTP error response:

    1. HTTP/1.1 500 Internal Server Error
    2. Cache-Control: must-revalidate
    3. Content-Length: 56
    4. Content-Type: application/json
    5. Date: Sat, 05 Oct 2013 08:55:29 GMT
    6. Server: CouchDB (Erlang OTP)
    7. {
    8. "error": "db_not_found",
    9. "reason": "could not open source"
    10. }

    2.4.2.2. Get Peers Information

    1. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -+
    2. ' Verify Peers: '
    3. ' +------------------------+ '
    4. ' | Check Target Existence | '
    5. ' +------------------------+ '
    6. ' | '
    7. ' | 200 OK '
    8. ' | '
    9. + - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - -+
    10. |
    11. + - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - -+
    12. ' Get Peers Information: | '
    13. ' v '
    14. ' +------------------------+ '
    15. ' | Get Source Information | '
    16. ' +------------------------+ '
    17. ' | GET /source | '
    18. ' +------------------------+ '
    19. ' | '
    20. ' | 200 OK '
    21. ' v '
    22. ' +------------------------+ '
    23. ' | Get Target Information | '
    24. ' +------------------------+ '
    25. ' | GET /target | '
    26. ' +------------------------+ '
    27. ' | '
    28. ' | 200 OK '
    29. ' | '
    30. + - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - -+
    31. |
    32. + - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - -+
    33. ' Find Common Ancestry: | '
    34. ' | '
    35. ' v '
    36. ' +-------------------------+ '
    37. ' | Generate Replication ID | '
    38. ' +-------------------------+ '
    39. ' '
    40. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -+

    The Replicator retrieves basic information both from Source and Target using GET /{db} requests. The GET response MUST contain JSON objects with the following mandatory fields:

    • instance_start_time (string): Always "0". (Returned for legacy reasons.)
    • update_seq (number / string): The current database Sequence ID.

    Any other fields are optional. The information that the Replicator needs is the update_seq field: this value will be used to define a temporary (because Database data is subject to change) upper bound for changes feed listening and statistic calculating to show proper Replication progress.

    2.4.2.2.1. Get Source Information

    Request:

    1. GET /source HTTP/1.1
    2. Accept: application/json
    3. Host: localhost:5984
    4. User-Agent: CouchDB

    Response:

    1. HTTP/1.1 200 OK
    2. Cache-Control: must-revalidate
    3. Content-Length: 256
    4. Content-Type: application/json
    5. Date: Tue, 08 Oct 2013 07:53:08 GMT
    6. Server: CouchDB (Erlang OTP)
    7. {
    8. "committed_update_seq": 61772,
    9. "compact_running": false,
    10. "db_name": "source",
    11. "disk_format_version": 6,
    12. "doc_count": 41961,
    13. "doc_del_count": 3807,
    14. "instance_start_time": "0",
    15. "purge_seq": 0,
    16. "sizes": {
    17. "active": 70781613961,
    18. "disk": 79132913799,
    19. "external": 72345632950
    20. },
    21. "update_seq": 61772
    22. }

    2.4.2.2.2. Get Target Information

    Request:

    1. GET /target/ HTTP/1.1
    2. Accept: application/json
    3. Host: localhost:5984
    4. User-Agent: CouchDB

    Response:

    1. HTTP/1.1 200 OK
    2. Content-Length: 363
    3. Content-Type: application/json
    4. Date: Tue, 08 Oct 2013 12:37:01 GMT
    5. Server: CouchDB (Erlang/OTP)
    6. {
    7. "compact_running": false,
    8. "db_name": "target",
    9. "disk_format_version": 5,
    10. "doc_count": 1832,
    11. "doc_del_count": 1,
    12. "instance_start_time": "0",
    13. "purge_seq": 0,
    14. "sizes": {
    15. "active": 50829452,
    16. "disk": 77001455,
    17. "external": 60326450
    18. },
    19. "update_seq": "1841-g1AAAADveJzLYWBgYMlgTmGQT0lKzi9KdUhJMtbLSs1LLUst0k"
    20. }

    2.4.2.3. Find Common Ancestry

    1. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
    2. ' Get Peers Information: '
    3. ' '
    4. ' +-------------------------------------------+ '
    5. ' | Get Target Information | '
    6. ' +-------------------------------------------+ '
    7. ' | '
    8. + - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - +
    9. |
    10. + - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - +
    11. ' Find Common Ancestry: v '
    12. ' +-------------------------------------------+ '
    13. ' | Generate Replication ID | '
    14. ' +-------------------------------------------+ '
    15. ' | '
    16. ' | '
    17. ' v '
    18. ' +-------------------------------------------+ '
    19. ' | Get Replication Log from Source | '
    20. ' +-------------------------------------------+ '
    21. ' | GET /source/_local/replication-id | '
    22. ' +-------------------------------------------+ '
    23. ' | '
    24. ' | 200 OK '
    25. ' | 404 Not Found '
    26. ' v '
    27. ' +-------------------------------------------+ '
    28. ' | Get Replication Log from Target | '
    29. ' +-------------------------------------------+ '
    30. ' | GET /target/_local/replication-id | '
    31. ' +-------------------------------------------+ '
    32. ' | '
    33. ' | 200 OK '
    34. ' | 404 Not Found '
    35. ' v '
    36. ' +-------------------------------------------+ '
    37. ' | Compare Replication Logs | '
    38. ' +-------------------------------------------+ '
    39. ' | '
    40. ' | Use latest common sequence as start point '
    41. ' | '
    42. + - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - +
    43. |
    44. |
    45. + - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - +
    46. ' Locate Changed Documents: | '
    47. ' | '
    48. ' v '
    49. ' +-------------------------------------------+ '
    50. ' | Listen Source Changes Feed | '
    51. ' +-------------------------------------------+ '
    52. ' '
    53. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

    2.4.2.3.1. Generate Replication ID

    Before Replication is started, the Replicator MUST generate a Replication ID. This value is used to track Replication History, resume and continue previously interrupted Replication process.

    The Replication ID generation algorithm is implementation specific. Whatever algorithm is used it MUST uniquely identify the Replication process. CouchDB’s Replicator, for example, uses the following factors in generating a Replication ID:

    • Persistent Peer UUID value. For CouchDB, the local is used
    • Source and Target URI and if Source or Target are local or remote Databases
    • If Target needed to be created
    • If Replication is Continuous
    • Any custom headers
    • Filter function code if used
    • Changes Feed query parameters, if any

    2.4.2.3.2. Retrieve Replication Logs from Source and Target

    Once the Replication ID has been generated, the Replicator SHOULD retrieve the Replication Log from both Source and Target using GET /{db}/_local/{docid}:

    The Replication Log SHOULD contain the following fields:

    • history (array of object): Replication history. Required
      • doc_write_failures (number): Number of failed writes
      • docs_read (number): Number of read documents
      • docs_written (number): Number of written documents
      • end_last_seq (number): Last processed Update Sequence ID
      • end_time (string): Replication completion timestamp in format
      • missing_checked (number): Number of checked revisions on Source
      • missing_found (number): Number of missing revisions found on Target
      • recorded_seq (number): Recorded intermediate Checkpoint. Required
      • session_id (string): Unique session ID. Commonly, a random UUID value is used. Required
      • start_last_seq (number): Start update Sequence ID
      • start_time (string): Replication start timestamp in RFC 5322 format
    • replication_id_version (number): Replication protocol version. Defines Replication ID calculation algorithm, HTTP API calls and the others routines. Required
    • session_id (string): Unique ID of the last session. Shortcut to the session_id field of the latest history object. Required
    • source_last_seq (number): Last processed Checkpoint. Shortcut to the recorded_seq field of the latest history object. Required

    This request MAY fall with a response:

    Request:

    Response:

    1. HTTP/1.1 404 Object Not Found
    2. Cache-Control: must-revalidate
    3. Content-Length: 41
    4. Content-Type: application/json
    5. Date: Tue, 08 Oct 2013 13:31:10 GMT
    6. Server: CouchDB (Erlang OTP)
    7. {
    8. "error": "not_found",
    9. "reason": "missing"
    10. }

    That’s OK. This means that there is no information about the current Replication so it must not have been run previously and as such the Replicator MUST run a Full Replication.

    2.4.2.3.3. Compare Replication Logs

    If the Replication Logs are successfully retrieved from both Source and Target then the Replicator MUST determine their common ancestry by following the next algorithm:

    • Compare session_id values for the chronological last session - if they match both Source and Target have a common Replication history and it seems to be valid. Use source_last_seq value for the startup Checkpoint
    • In case of mismatch, iterate over the history collection to search for the latest (chronologically) common session_id for Source and Target. Use value of recorded_seq field as startup Checkpoint

    If Source and Target has no common ancestry, the Replicator MUST run Full Replication.

    1. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
    2. ' '
    3. ' +------------------------------+ '
    4. ' | Compare Replication Logs | '
    5. ' +------------------------------+ '
    6. ' | '
    7. ' | '
    8. + - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - +
    9. |
    10. + - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - +
    11. ' Locate Changed Documents: | '
    12. ' | '
    13. ' | '
    14. ' v '
    15. ' +-------------------------------+ '
    16. ' +------> | Listen to Changes Feed | -----+ '
    17. ' | +-------------------------------+ | '
    18. ' | | GET /source/_changes | | '
    19. ' | | POST /source/_changes | | '
    20. ' | +-------------------------------+ | '
    21. ' | | | '
    22. ' | | | '
    23. ' | There are new changes | | No more changes '
    24. ' | | | '
    25. ' | v v '
    26. ' | +-------------------------------+ +-----------------------+ '
    27. ' | | Read Batch of Changes | | Replication Completed | '
    28. ' | +-------------------------------+ +-----------------------+ '
    29. ' | | '
    30. ' | No | '
    31. ' | v '
    32. ' | +-------------------------------+ '
    33. ' | | Compare Documents Revisions | '
    34. ' | +-------------------------------+ '
    35. ' | | POST /target/_revs_diff | '
    36. ' | +-------------------------------+ '
    37. ' | 200 OK | '
    38. ' | v '
    39. ' | +-------------------------------+ '
    40. ' +------- | Any Differences Found? | '
    41. ' +-------------------------------+ '
    42. ' | '
    43. ' Yes | '
    44. ' | '
    45. + - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - +
    46. |
    47. + - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - +
    48. ' Replicate Changes: | '
    49. ' v '
    50. ' +-------------------------------+ '
    51. ' | Fetch Next Changed Document | '
    52. ' +-------------------------------+ '
    53. ' '
    54. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

    2.4.2.4.1. Listen to Changes Feed

    When the start up Checkpoint has been defined, the Replicator SHOULD read the Source’s Changes Feed by using a request. This request MUST be made with the following query parameters:

    • feed parameter defines the Changes Feed response style: for Continuous Replication the continuous value SHOULD be used, otherwise - normal.
    • style=all_docs query parameter tells the Source that it MUST include all Revision leaves for each document’s event in output.
    • For Continuous Replication the heartbeat parameter defines the heartbeat period in milliseconds. The RECOMMENDED value by default is 10000 (10 seconds).
    • If a startup Checkpoint was found during the Replication Logs comparison, the since query parameter MUST be passed with this value. In case of Full Replication it MAY be 0 (number zero) or be omitted.

    Additionally, the filter query parameter MAY be specified to enable a filter function on Source side. Other custom parameters MAY also be provided.

    2.4.2.4.2. Read Batch of Changes

    Reading the whole feed in a single shot may not be an optimal use of resources. It is RECOMMENDED to process the feed in small chunks. However, there is no specific recommendation on chunk size since it is heavily dependent on available resources: large chunks requires more memory while they reduce I/O operations and vice versa.

    Note, that Changes Feed output format is different for a request with feed=normal and with query parameter.

    Normal Feed:

    Request:

    1. GET /source/_changes?feed=normal&style=all_docs&heartbeat=10000 HTTP/1.1
    2. Accept: application/json
    3. Host: localhost:5984
    4. User-Agent: CouchDB

    Response:

    1. HTTP/1.1 200 OK
    2. Cache-Control: must-revalidate
    3. Content-Type: application/json
    4. Date: Fri, 09 May 2014 16:20:41 GMT
    5. Server: CouchDB (Erlang OTP)
    6. Transfer-Encoding: chunked
    7. {"results":[
    8. {"seq":14,"id":"f957f41e","changes":[{"rev":"3-46a3"}],"deleted":true}
    9. {"seq":29,"id":"ddf339dd","changes":[{"rev":"10-304b"}]}
    10. {"seq":37,"id":"d3cc62f5","changes":[{"rev":"2-eec2"}],"deleted":true}
    11. {"seq":39,"id":"f13bd08b","changes":[{"rev":"1-b35d"}]}
    12. {"seq":41,"id":"e0a99867","changes":[{"rev":"2-c1c6"}]}
    13. {"seq":42,"id":"a75bdfc5","changes":[{"rev":"1-967a"}]}
    14. {"seq":43,"id":"a5f467a0","changes":[{"rev":"1-5575"}]}
    15. {"seq":45,"id":"470c3004","changes":[{"rev":"11-c292"}]}
    16. {"seq":46,"id":"b1cb8508","changes":[{"rev":"10-ABC"}]}
    17. {"seq":47,"id":"49ec0489","changes":[{"rev":"157-b01f"},{"rev":"123-6f7c"}]}
    18. {"seq":49,"id":"dad10379","changes":[{"rev":"1-9346"},{"rev":"6-5b8a"}]}
    19. {"seq":50,"id":"73464877","changes":[{"rev":"1-9f08"}]}
    20. {"seq":51,"id":"7ae19302","changes":[{"rev":"1-57bf"}]}
    21. {"seq":63,"id":"6a7a6c86","changes":[{"rev":"5-acf6"}],"deleted":true}
    22. {"seq":64,"id":"dfb9850a","changes":[{"rev":"1-102f"}]}
    23. {"seq":65,"id":"c532afa7","changes":[{"rev":"1-6491"}]}
    24. {"seq":66,"id":"af8a9508","changes":[{"rev":"1-3db2"}]}
    25. {"seq":67,"id":"caa3dded","changes":[{"rev":"1-6491"}]}
    26. {"seq":68,"id":"79f3b4e9","changes":[{"rev":"1-102f"}]}
    27. {"seq":69,"id":"1d89d16f","changes":[{"rev":"1-3db2"}]}
    28. {"seq":71,"id":"abae7348","changes":[{"rev":"2-7051"}]}
    29. {"seq":77,"id":"6c25534f","changes":[{"rev":"9-CDE"},{"rev":"3-00e7"},{"rev":"1-ABC"}]}
    30. {"seq":78,"id":"SpaghettiWithMeatballs","changes":[{"rev":"22-5f95"}]}
    31. ],
    32. "last_seq":78}

    Continuous Feed:

    Request:

    1. GET /source/_changes?feed=continuous&style=all_docs&heartbeat=10000 HTTP/1.1
    2. Accept: application/json
    3. Host: localhost:5984
    4. User-Agent: CouchDB

    Response:

    1. HTTP/1.1 200 OK
    2. Cache-Control: must-revalidate
    3. Content-Type: application/json
    4. Date: Fri, 09 May 2014 16:22:22 GMT
    5. Server: CouchDB (Erlang OTP)
    6. Transfer-Encoding: chunked
    7. {"seq":14,"id":"f957f41e","changes":[{"rev":"3-46a3"}],"deleted":true}
    8. {"seq":29,"id":"ddf339dd","changes":[{"rev":"10-304b"}]}
    9. {"seq":37,"id":"d3cc62f5","changes":[{"rev":"2-eec2"}],"deleted":true}
    10. {"seq":39,"id":"f13bd08b","changes":[{"rev":"1-b35d"}]}
    11. {"seq":41,"id":"e0a99867","changes":[{"rev":"2-c1c6"}]}
    12. {"seq":42,"id":"a75bdfc5","changes":[{"rev":"1-967a"}]}
    13. {"seq":43,"id":"a5f467a0","changes":[{"rev":"1-5575"}]}
    14. {"seq":45,"id":"470c3004","changes":[{"rev":"11-c292"}]}
    15. {"seq":46,"id":"b1cb8508","changes":[{"rev":"10-ABC"}]}
    16. {"seq":47,"id":"49ec0489","changes":[{"rev":"157-b01f"},{"rev":"123-6f7c"}]}
    17. {"seq":49,"id":"dad10379","changes":[{"rev":"1-9346"},{"rev":"6-5b8a"}]}
    18. {"seq":50,"id":"73464877","changes":[{"rev":"1-9f08"}]}
    19. {"seq":51,"id":"7ae19302","changes":[{"rev":"1-57bf"}]}
    20. {"seq":63,"id":"6a7a6c86","changes":[{"rev":"5-acf6"}],"deleted":true}
    21. {"seq":64,"id":"dfb9850a","changes":[{"rev":"1-102f"}]}
    22. {"seq":65,"id":"c532afa7","changes":[{"rev":"1-6491"}]}
    23. {"seq":66,"id":"af8a9508","changes":[{"rev":"1-3db2"}]}
    24. {"seq":67,"id":"caa3dded","changes":[{"rev":"1-6491"}]}
    25. {"seq":68,"id":"79f3b4e9","changes":[{"rev":"1-102f"}]}
    26. {"seq":69,"id":"1d89d16f","changes":[{"rev":"1-3db2"}]}
    27. {"seq":71,"id":"abae7348","changes":[{"rev":"2-7051"}]}
    28. {"seq":75,"id":"SpaghettiWithMeatballs","changes":[{"rev":"21-5949"}]}
    29. {"seq":77,"id":"6c255","changes":[{"rev":"9-CDE"},{"rev":"3-00e7"},{"rev":"1-ABC"}]}
    30. {"seq":78,"id":"SpaghettiWithMeatballs","changes":[{"rev":"22-5f95"}]}

    For both Changes Feed formats record-per-line style is preserved to simplify iterative fetching and decoding JSON objects with less memory footprint.

    2.4.2.4.3. Calculate Revision Difference

    After reading the batch of changes from the Changes Feed, the Replicator forms a JSON mapping object for Document ID and related leaf Revisions and sends the result to Target via a request:

    Request:

    1. POST /target/_revs_diff HTTP/1.1
    2. Accept: application/json
    3. Content-Length: 287
    4. Content-Type: application/json
    5. Host: localhost:5984
    6. User-Agent: CouchDB
    7. {
    8. "baz": [
    9. "2-7051cbe5c8faecd085a3fa619e6e6337"
    10. ],
    11. "foo": [
    12. "3-6a540f3d701ac518d3b9733d673c5484"
    13. ],
    14. "bar": [
    15. "1-d4e501ab47de6b2000fc8a02f84a0c77",
    16. "1-967a00dff5e02add41819138abb3284d"
    17. ]
    18. }

    Response:

    1. HTTP/1.1 200 OK
    2. Cache-Control: must-revalidate
    3. Content-Length: 88
    4. Content-Type: application/json
    5. Date: Fri, 25 Oct 2013 14:44:41 GMT
    6. Server: CouchDB (Erlang/OTP)
    7. {
    8. "baz": {
    9. "missing": [
    10. "2-7051cbe5c8faecd085a3fa619e6e6337"
    11. ]
    12. },
    13. "bar": {
    14. "missing": [
    15. "1-d4e501ab47de6b2000fc8a02f84a0c77"
    16. ]
    17. }
    18. }

    In the response the Replicator receives a Document ID – Revisions mapping, but only for Revisions that do not exist in Target and are REQUIRED to be transferred from Source.

    Request

    1. POST /target/_revs_diff HTTP/1.1
    2. Accept: application/json
    3. Content-Length: 160
    4. Content-Type: application/json
    5. Host: localhost:5984
    6. User-Agent: CouchDB
    7. {
    8. "foo": [
    9. "3-6a540f3d701ac518d3b9733d673c5484"
    10. ],
    11. "bar": [
    12. "1-967a00dff5e02add41819138abb3284d"
    13. ]
    14. }

    Response:

    1. HTTP/1.1 200 OK
    2. Cache-Control: must-revalidate
    3. Content-Length: 2
    4. Content-Type: application/json
    5. Date: Fri, 25 Oct 2013 14:45:00 GMT
    6. Server: CouchDB (Erlang/OTP)
    7. {}

    2.4.2.4.4. Replication Completed

    When there are no more changes left to process and no more Documents left to replicate, the Replicator finishes the Replication process. If Replication wasn’t Continuous, the Replicator MAY return a response to client with statistics about the process.

    1. HTTP/1.1 200 OK
    2. Cache-Control: must-revalidate
    3. Content-Length: 414
    4. Content-Type: application/json
    5. Date: Fri, 09 May 2014 15:14:19 GMT
    6. Server: CouchDB (Erlang OTP)
    7. {
    8. "history": [
    9. {
    10. "doc_write_failures": 2,
    11. "docs_read": 2,
    12. "docs_written": 0,
    13. "end_last_seq": 2939,
    14. "end_time": "Fri, 09 May 2014 15:14:19 GMT",
    15. "missing_checked": 1835,
    16. "missing_found": 2,
    17. "recorded_seq": 2939,
    18. "session_id": "05918159f64842f1fe73e9e2157b2112",
    19. "start_last_seq": 0,
    20. "start_time": "Fri, 09 May 2014 15:14:18 GMT"
    21. }
    22. ],
    23. "ok": true,
    24. "replication_id_version": 3,
    25. "session_id": "05918159f64842f1fe73e9e2157b2112",
    26. "source_last_seq": 2939
    27. }

    2.4.2.5. Replicate Changes

    1. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
    2. ' Locate Changed Documents: '
    3. ' '
    4. ' +-------------------------------------+ '
    5. ' | Any Differences Found? | '
    6. ' +-------------------------------------+ '
    7. ' | '
    8. ' | '
    9. ' | '
    10. + - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - +
    11. |
    12. + - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - +
    13. ' Replicate Changes: | '
    14. ' v '
    15. ' +-------------------------------------+ '
    16. ' +---------> | Fetch Next Changed Document | <---------------------+ '
    17. ' | +-------------------------------------+ | '
    18. ' | | GET /source/docid | | '
    19. ' | +-------------------------------------+ | '
    20. ' | | | '
    21. ' | | | '
    22. ' | | 201 Created | '
    23. ' | | 200 OK 401 Unauthorized | '
    24. ' | | 403 Forbidden | '
    25. ' | | | '
    26. ' | v | '
    27. ' | +-------------------------------------+ | '
    28. ' | +------ | Document Has Changed Attachments? | | '
    29. ' | | +-------------------------------------+ | '
    30. ' | | | | '
    31. ' | | | | '
    32. ' | | | Yes | '
    33. ' | | | | '
    34. ' | | v | '
    35. ' | | +------------------------+ Yes +---------------------------+ '
    36. ' | | No | Are They Big Enough? | -------> | Update Document on Target | '
    37. ' | | +------------------------+ +---------------------------+ '
    38. ' | | | | PUT /target/docid | '
    39. ' | | | +---------------------------+ '
    40. ' | | | '
    41. ' | | | No '
    42. ' | | | '
    43. ' | | v '
    44. ' | | +-------------------------------------+ '
    45. ' | +-----> | Put Document Into the Stack | '
    46. ' | +-------------------------------------+ '
    47. ' | | '
    48. ' | | '
    49. ' | v '
    50. ' | No +-------------------------------------+ '
    51. ' +---------- | Stack is Full? | '
    52. ' | +-------------------------------------+ '
    53. ' | | '
    54. ' | | Yes '
    55. ' | | '
    56. ' | v '
    57. ' | +-------------------------------------+ '
    58. ' | | Upload Stack of Documents to Target | '
    59. ' | +-------------------------------------+ '
    60. ' | | POST /target/_bulk_docs | '
    61. ' | +-------------------------------------+ '
    62. ' | | '
    63. ' | | 201 Created '
    64. ' | v '
    65. ' | +-------------------------------------+ '
    66. ' | | Ensure in Commit | '
    67. ' | +-------------------------------------+ '
    68. ' | | POST /target/_ensure_full_commit | '
    69. ' | +-------------------------------------+ '
    70. ' | | '
    71. ' | | 201 Created '
    72. ' | v '
    73. ' | +-------------------------------------+ '
    74. ' | | Record Replication Checkpoint | '
    75. ' | +-------------------------------------+ '
    76. ' | | PUT /source/_local/replication-id | '
    77. ' | | PUT /target/_local/replication-id | '
    78. ' | +-------------------------------------+ '
    79. ' | | '
    80. ' | | 201 Created '
    81. ' | v '
    82. ' | No +-------------------------------------+ '
    83. ' +---------- | All Documents from Batch Processed? | '
    84. ' +-------------------------------------+ '
    85. ' | '
    86. ' Yes | '
    87. ' | '
    88. + - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - +
    89. |
    90. + - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - +
    91. ' Locate Changed Documents: | '
    92. ' v '
    93. ' +-------------------------------------+ '
    94. ' | Listen to Changes Feed | '
    95. ' +-------------------------------------+ '
    96. ' '
    97. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

    2.4.2.5.1. Fetch Changed Documents

    At this step the Replicator MUST fetch all Document Leaf Revisions from Source that are missed at Target. This operation is effective if Replication WILL use previously calculated Revision differences since they define missing Documents and their Revisions.

    To fetch the Document the Replicator will make a request with the following query parameters:

    • revs=true: Instructs the Source to include the list of all known revisions into the Document in the _revisions field. This information is needed to synchronize the Document’s ancestors history between Source and Target
    • The open_revs query parameter contains a JSON array with a list of Leaf Revisions that are needed to be fetched. If the specified Revision exists then the Document MUST be returned for this Revision. Otherwise, Source MUST return an object with the single field missing with the missed Revision as the value. In case the Document contains attachments, Source MUST return information only for those ones that had been changed (added or updated) since the specified Revision values. If an attachment was deleted, the Document MUST NOT have stub information for it
    • latest=true: Ensures, that Source will return the latest Document Revision regardless of which one was specified in the open_revs query parameter. This parameter solves a race condition problem where the requested Document may be changed in between this step and handling related events on the Changes Feed

    In the response Source SHOULD return multipart/mixed or respond instead with application/json unless the Accept header specifies a different mime type. The multipart/mixed content type allows handling the response data as a stream, since there could be multiple documents (one per each Leaf Revision) plus several attachments. These attachments are mostly binary and JSON has no way to handle such data except as base64 encoded strings which are very ineffective for transfer and processing operations.

    With a multipart/mixed response the Replicator handles multiple Document Leaf Revisions and their attachments one by one as raw data without any additional encoding applied. There is also one agreement to make data processing more effective: the Document ALWAYS goes before its attachments, so the Replicator has no need to process all the data to map related Documents-Attachments and may handle it as stream with lesser memory footprint.

    After receiving the response, the Replicator puts all the received data into a local stack for further bulk upload to utilize network bandwidth effectively. The local stack size could be limited by number of Documents or bytes of handled JSON data. When the stack is full the Replicator uploads all the handled Document in bulk mode to the Target. While bulk operations are highly RECOMMENDED to be used, in certain cases the Replicator MAY upload Documents to Target one by one.

    2.4.2.5.2. Upload Batch of Changed Documents

    To upload multiple Documents in a single shot the Replicator sends a POST /{db}/_bulk_docs request to Target with payload containing a JSON object with the following mandatory fields:

    • docs (array of objects): List of Document objects to update on Target. These Documents MUST contain the _revisions field that holds a list of the full Revision history to let Target create Leaf Revisions that correctly preserve ancestry
    • new_edits (boolean): Special flag that instructs Target to store Documents with the specified Revision (field _rev) value as-is without generating a new revision. Always false

    The request also MAY contain X-Couch-Full-Commit that used to control CouchDB <3.0 behavior when delayed commits were enabled. Other Peers MAY ignore this header or use it to control similar local feature.

    Request:

    1. POST /target/_bulk_docs HTTP/1.1
    2. Accept: application/json
    3. Content-Length: 826
    4. Content-Type:application/json
    5. Host: localhost:5984
    6. User-Agent: CouchDB
    7. X-Couch-Full-Commit: false
    8. {
    9. "docs": [
    10. {
    11. "_id": "SpaghettiWithMeatballs",
    12. "_rev": "1-917fa2381192822767f010b95b45325b",
    13. "_revisions": {
    14. "ids": [
    15. "917fa2381192822767f010b95b45325b"
    16. ],
    17. "start": 1
    18. },
    19. "description": "An Italian-American delicious dish",
    20. "ingredients": [
    21. "spaghetti",
    22. "tomato sauce",
    23. "meatballs"
    24. ],
    25. "name": "Spaghetti with meatballs"
    26. },
    27. {
    28. "_id": "LambStew",
    29. "_rev": "1-34c318924a8f327223eed702ddfdc66d",
    30. "_revisions": {
    31. "ids": [
    32. "34c318924a8f327223eed702ddfdc66d"
    33. ],
    34. "start": 1
    35. },
    36. "servings": 6,
    37. "subtitle": "Delicious with scone topping",
    38. "title": "Lamb Stew"
    39. },
    40. {
    41. "_id": "FishStew",
    42. "_rev": "1-9c65296036141e575d32ba9c034dd3ee",
    43. "_revisions": {
    44. "ids": [
    45. "9c65296036141e575d32ba9c034dd3ee"
    46. ],
    47. "start": 1
    48. },
    49. "servings": 4,
    50. "subtitle": "Delicious with fresh bread",
    51. "title": "Fish Stew"
    52. }
    53. ],
    54. "new_edits": false
    55. }

    In its response Target MUST return a JSON array with a list of Document update statuses. If the Document has been stored successfully, the list item MUST contain the field ok with true value. Otherwise it MUST contain error and reason fields with error type and a human-friendly reason description.

    Document updating failure isn’t fatal as Target MAY reject the update for its own reasons. It’s RECOMMENDED to use error type forbidden for rejections, but other error types can also be used (like invalid field name etc.). The Replicator SHOULD NOT retry uploading rejected documents unless there are good reasons for doing so (e.g. there is special error type for that).

    Note that while a update may fail for one Document in the response, Target can still return a response. Same will be true if all updates fail for all uploaded Documents.

    Response:

    1. HTTP/1.1 201 Created
    2. Cache-Control: must-revalidate
    3. Content-Length: 246
    4. Content-Type: application/json
    5. Date: Sun, 10 Nov 2013 19:02:26 GMT
    6. Server: CouchDB (Erlang/OTP)
    7. [
    8. {
    9. "ok": true,
    10. "id": "SpaghettiWithMeatballs",
    11. "rev":" 1-917fa2381192822767f010b95b45325b"
    12. },
    13. {
    14. "ok": true,
    15. "id": "FishStew",
    16. "rev": "1-9c65296036141e575d32ba9c034dd3ee"
    17. },
    18. {
    19. "error": "forbidden",
    20. "id": "LambStew",
    21. "reason": "sorry",
    22. "rev": "1-34c318924a8f327223eed702ddfdc66d"
    23. }
    24. ]

    2.4.2.5.3. Upload Document with Attachments

    There is a special optimization case when then Replicator WILL NOT use bulk upload of changed Documents. This case is applied when Documents contain a lot of attached files or the files are too big to be efficiently encoded with Base64.

    For this case the Replicator issues a request with multipart/related content type. Such a request allows one to easily stream the Document and all its attachments one by one without any serialization overhead.

    Request:

    Response:

    1. HTTP/1.1 201 Created
    2. Cache-Control: must-revalidate
    3. Content-Length: 105
    4. Content-Type: application/json
    5. Date: Fri, 08 Nov 2013 16:35:27 GMT
    6. Server: CouchDB (Erlang/OTP)
    7. {
    8. "ok": true,
    9. "id": "SpaghettiWithMeatballs",
    10. "rev": "7-474f12eb068c717243487a9505f6123b"
    11. }

    Unlike bulk updating via POST /{db}/_bulk_docs endpoint, the response MAY come with a different status code. For instance, in the case when the Document is rejected, Target SHOULD respond with a :

    Response:

    1. HTTP/1.1 403 Forbidden
    2. Cache-Control: must-revalidate
    3. Content-Length: 39
    4. Content-Type: application/json
    5. Date: Fri, 08 Nov 2013 16:35:27 GMT
    6. Server: CouchDB (Erlang/OTP)
    7. {
    8. "error": "forbidden",
    9. "reason": "sorry"
    10. }

    Replicator SHOULD NOT retry requests in case of a 401 Unauthorized, , 409 Conflict or since repeating the request couldn’t solve the issue with user credentials or uploaded data.

    2.4.2.5.4. Ensure In Commit

    Once a batch of changes has been successfully uploaded to Target, the Replicator issues a request to ensure that every transferred bit is laid down on disk or other persistent storage place. Target MUST return 201 Created response with a JSON object containing the following mandatory fields:

    • instance_start_time (string): Timestamp of when the database was opened, expressed in microseconds since the epoch

    • ok (boolean): Operation status. Constantly true

      Request:

      1. POST /target/_ensure_full_commit HTTP/1.1
      2. Accept: application/json
      3. Content-Type: application/json
      4. Host: localhost:5984

      Response:

      1. HTTP/1.1 201 Created
      2. Cache-Control: must-revalidate
      3. Content-Length: 53
      4. Content-Type: application/json
      5. Date: Web, 06 Nov 2013 18:20:43 GMT
      6. Server: CouchDB (Erlang/OTP)
      7. {
      8. "instance_start_time": "0",
      9. "ok": true
      10. }

    2.4.2.5.5. Record Replication Checkpoint

    Since batches of changes were uploaded and committed successfully, the Replicator updates the Replication Log both on Source and Target recording the current Replication state. This operation is REQUIRED so that in the case of Replication failure the replication can resume from last point of success, not from the very beginning.

    Replicator updates Replication Log on Source:

    Request:

    1. PUT /source/_local/afa899a9e59589c3d4ce5668e3218aef HTTP/1.1
    2. Accept: application/json
    3. Content-Length: 591
    4. Content-Type: application/json
    5. Host: localhost:5984
    6. User-Agent: CouchDB
    7. {
    8. "_id": "_local/afa899a9e59589c3d4ce5668e3218aef",
    9. "_rev": "0-1",
    10. "_revisions": {
    11. "ids": [
    12. "31f36e40158e717fbe9842e227b389df"
    13. ],
    14. "start": 1
    15. },
    16. "history": [
    17. {
    18. "doc_write_failures": 0,
    19. "docs_read": 6,
    20. "docs_written": 6,
    21. "end_last_seq": 26,
    22. "end_time": "Thu, 07 Nov 2013 09:42:17 GMT",
    23. "missing_checked": 6,
    24. "missing_found": 6,
    25. "recorded_seq": 26,
    26. "session_id": "04bf15bf1d9fa8ac1abc67d0c3e04f07",
    27. "start_last_seq": 0,
    28. "start_time": "Thu, 07 Nov 2013 09:41:43 GMT"
    29. }
    30. ],
    31. "replication_id_version": 3,
    32. "session_id": "04bf15bf1d9fa8ac1abc67d0c3e04f07",
    33. "source_last_seq": 26
    34. }

    Response:

    1. HTTP/1.1 201 Created
    2. Cache-Control: must-revalidate
    3. Content-Length: 75
    4. Content-Type: application/json
    5. Date: Thu, 07 Nov 2013 09:42:17 GMT
    6. Server: CouchDB (Erlang/OTP)
    7. {
    8. "id": "_local/afa899a9e59589c3d4ce5668e3218aef",
    9. "ok": true,
    10. "rev": "0-2"
    11. }

    …and on Target too:

    Request:

    1. PUT /target/_local/afa899a9e59589c3d4ce5668e3218aef HTTP/1.1
    2. Accept: application/json
    3. Content-Length: 591
    4. Content-Type: application/json
    5. Host: localhost:5984
    6. User-Agent: CouchDB
    7. {
    8. "_id": "_local/afa899a9e59589c3d4ce5668e3218aef",
    9. "_rev": "1-31f36e40158e717fbe9842e227b389df",
    10. "_revisions": {
    11. "ids": [
    12. "31f36e40158e717fbe9842e227b389df"
    13. ],
    14. "start": 1
    15. },
    16. "history": [
    17. {
    18. "doc_write_failures": 0,
    19. "docs_read": 6,
    20. "docs_written": 6,
    21. "end_last_seq": 26,
    22. "end_time": "Thu, 07 Nov 2013 09:42:17 GMT",
    23. "missing_checked": 6,
    24. "missing_found": 6,
    25. "recorded_seq": 26,
    26. "session_id": "04bf15bf1d9fa8ac1abc67d0c3e04f07",
    27. "start_last_seq": 0,
    28. "start_time": "Thu, 07 Nov 2013 09:41:43 GMT"
    29. }
    30. ],
    31. "replication_id_version": 3,
    32. "session_id": "04bf15bf1d9fa8ac1abc67d0c3e04f07",
    33. "source_last_seq": 26
    34. }

    Response:

    1. HTTP/1.1 201 Created
    2. Cache-Control: must-revalidate
    3. Content-Length: 106
    4. Content-Type: application/json
    5. Date: Thu, 07 Nov 2013 09:42:17 GMT
    6. Server: CouchDB (Erlang/OTP)
    7. {
    8. "id": "_local/afa899a9e59589c3d4ce5668e3218aef",
    9. "ok": true,
    10. "rev": "2-9b5d1e36bed6ae08611466e30af1259a"
    11. }

    2.4.2.6. Continue Reading Changes

    Once a batch of changes had been processed and transferred to Target successfully, the Replicator can continue to listen to the Changes Feed for new changes. If there are no new changes to process the Replication is considered to be done.

    For Continuous Replication, the Replicator MUST continue to wait for new changes from Source.

    Since the CouchDB Replication Protocol works on top of HTTP, which is based on TCP/IP, the Replicator SHOULD expect to be working within an unstable environment with delays, losses and other bad surprises that might eventually occur. The Replicator SHOULD NOT count every HTTP request failure as a fatal error. It SHOULD be smart enough to detect timeouts, repeat failed requests, be ready to process incomplete or malformed data and so on. Data must flow - that’s the rule.

    2.4.4. Error Responses

    In case something goes wrong the Peer MUST respond with a JSON object with the following REQUIRED fields:

    • error (string): Error type for programs and developers
    • reason (string): Error description for humans

    2.4.4.1. Bad Request

    If a request contains malformed data (like invalid JSON) the Peer MUST respond with a HTTP and bad_request as error type:

    1. {
    2. "error": "bad_request",
    3. "reason": "invalid json"
    4. }

    2.4.4.2. Unauthorized

    If a Peer REQUIRES credentials be included with the request and the request does not contain acceptable credentials then the Peer MUST respond with the HTTP and unauthorized as error type:

    1. {
    2. "error": "unauthorized",
    3. "reason": "Name or password is incorrect"
    4. }

    2.4.4.3. Forbidden

    If a Peer receives valid user credentials, but the requester does not have sufficient permissions to perform the operation then the Peer MUST respond with a HTTP and forbidden as error type:

    1. {
    2. "error": "forbidden",
    3. "reason": "You may only update your own user document."
    4. }

    If the requested resource, Database or Document wasn’t found on a Peer, the Peer MUST respond with a HTTP 404 Not Found and not_found as error type:

    1. {
    2. "error": "not_found",
    3. "reason": "database \"target\" does not exists"
    4. }

    2.4.4.5. Method Not Allowed

    If an unsupported method was used then the Peer MUST respond with a HTTP 405 Method Not Allowed and method_not_allowed as error type:

    1. {
    2. "error": "method_not_allowed",
    3. "reason": "Only GET, PUT, DELETE allowed"
    4. }

    2.4.4.6. Resource Conflict

    A resource conflict error occurs when there are concurrent updates of the same resource by multiple clients. In this case the Peer MUST respond with a HTTP 409 Conflict and conflict as error type:

    1. {
    2. "error": "conflict",
    3. "reason": "document update conflict"
    4. }

    2.4.4.7. Precondition Failed

    The HTTP 412 Precondition Failed response may be sent in case of an attempt to create a Database (error type db_exists) that already exists or some attachment information is missing (error type missing_stub). There is no explicit error type restrictions, but it is RECOMMEND to use error types that are previously mentioned:

    1. {
    2. "error": "db_exists",
    3. "reason": "database \"target\" exists"
    4. }

    2.4.4.8. Server Error

    1. {
    2. "error": "worker_died",
    3. }

    There are RECOMMENDED approaches to optimize the Replication process:

    • Keep the number of HTTP requests at a reasonable minimum
    • Try to work with a connection pool and make parallel/multiple requests whenever possible
    • Don’t close sockets after each request: respect the keep-alive option
    • Use continuous sessions (cookies, etc.) to reduce authentication overhead
    • Try to use bulk requests for every operations with Documents
    • Find out optimal batch size for Changes feed processing
    • Preserve Replication Logs and resume Replication from the last Checkpoint whenever possible
    • Optimize filter functions: let them run as fast as possible
    • Get ready for surprises: networks are very unstable environments

    2.4.6. API Reference

    2.4.6.1. Common Methods

    • PUT /{db} – Create Target if it not exists and the option was provided
    • – Locate Revisions that are not known to Target
    • POST /{db}/_bulk_docs – Upload Revisions to Target
    • – Upload a single Document with attachments to Target
    • POST /{db}/_ensure_full_commit – Ensure that all changes are stored on disk

    2.4.6.3. For Source

    • GET /{db}/_changes – Fetch changes since the last pull of Source
    • – Fetch changes for specified Document IDs since the last pull of Source
    • GET /{db}/{docid} – Retrieve a single Document from Source with attachments