Reindex data

    With the reindex operation, you can copy all or a subset of documents that you select through a query to another index. Reindex is a POST operation. In its most basic form, you specify a source index and a destination index.

    Reindexing can be an expensive operation depending on the size of your source index. We recommend you disable replicas in your destination index by setting number_of_replicas to 0 and re-enable them once the reindex process is complete.



    You can copy all documents from one index to another.

    You first need to create a destination index with your desired field mappings and settings or you can copy the ones from your source index:

    This reindex command copies all the documents from a source index to a destination index:

    1. POST _reindex
    2. {
    3. "source":{
    4. "index":"source"
    5. },
    6. "dest":{
    7. "index":"destination"
    8. }
    9. }

    If the destination index is not already created, the reindex operation creates a new destination index with default configurations.

    Reindex from a remote cluster

    You can copy documents from an index in a remote cluster. Use the remote option to specify the remote hostname and the required login credentials.

    This command reaches out to a remote cluster, logs in with the username and password, and copies all the documents from the source index in that remote cluster to the destination index in your local cluster:

    1. POST _reindex
    2. {
    3. "source":{
    4. "remote":{
    5. "host":"https://<REST_endpoint_of_remote_cluster>:9200",
    6. "username":"YOUR_USERNAME",
    7. "password":"YOUR_PASSWORD"
    8. },
    9. "index": "source"
    10. },
    11. "dest":{
    12. }
    13. }

    Reindex a subset of documents

    You can copy a specific set of documents that match a search query.

    This command copies only a subset of documents matched by a query operation to the destination index:

    For a list of all query operations, see Full-text queries.

    You can combine documents from one or more indices by adding the source indices as a list.

    This command copies all documents from two source indices to one destination index:

    1. POST _reindex
    2. {
    3. "source":{
    4. "index":[
    5. "source_2"
    6. ]
    7. },
    8. "dest":{
    9. "index":"destination"
    10. }
    11. }

    Make sure the number of shards for your source and destination indices are the same.

    Reindex only unique documents

    You can copy only documents missing from a destination index by setting the op_type option to create. In this case, if a document with the same ID already exists, the operation ignores the one from the source index. To ignore all version conflicts of documents, set the conflicts option to proceed.

    1. POST _reindex
    2. {
    3. "conflicts":"proceed",
    4. "source":{
    5. "index":"source"
    6. },
    7. "dest":{
    8. "index":"destination",
    9. "op_type":"create"
    10. }
    11. }

    Transform documents during reindexing

    You can transform your data during the reindexing process using the script option. We recommend Painless for scripting in OpenSearch.

    You can also specify an ingest pipeline to transform your data during the reindexing process.

    You would first have to create a pipeline with processors defined. You have a number of different processors available to use in your ingest pipeline.

    Here’s a sample ingest pipeline that defines a split processor that splits a text field based on a space separator and stores it in a new word field. The script processor is a Painless script that finds the length of the word field and stores it in a new word_count field. The remove processor removes the test field.

    1. PUT _ingest/pipeline/pipeline-test
    2. {
    3. "description": "Splits the text field into a list. Computes the length of the 'word' field and stores it in a new 'word_count' field. Removes the 'test' field.",
    4. "split": {
    5. "field": "text",
    6. "separator": "\\s+",
    7. "target_field": "word"
    8. },
    9. }
    10. {
    11. "script": {
    12. "lang": "painless",
    13. "source": "ctx.word_count = ctx.word.length"
    14. }
    15. },
    16. {
    17. "remove": {
    18. "field": "test"
    19. }
    20. }
    21. ]
    22. }

    After creating a pipeline, you can use the reindex operation:

    1. POST _reindex
    2. {
    3. "source": {
    4. "index": "source",
    5. },
    6. "dest": {
    7. "index": "destination",
    8. "pipeline": "pipeline-test"
    9. }
    10. }

    To update the data in your current index itself without copying it to a different index, use the update_by_query operation.

    The update_by_query operation is POST operation that you can perform on a single index at a time.

    If you run this command with no parameters, it increments the version number for all documents in the index.

    Source index options

    You can specify the following options for your source index:

    Destination index options