Take and restore snapshots

    OpenSearch snapshots are incremental, meaning that they only store data that has changed since the last successful snapshot. The difference in disk usage between frequent and infrequent snapshots is often minimal.

    In other words, taking hourly snapshots for a week (for a total of 168 snapshots) might not use much more disk space than taking a single snapshot at the end of the week. Also, the more frequently you take snapshots, the less time they take to complete. Some OpenSearch users take snapshots as often as every 30 minutes.

    If you need to delete a snapshot, be sure to use the OpenSearch API rather than navigating to the storage location and purging files. Incremental snapshots from a cluster often share a lot of the same data; when you use the API, OpenSearch only removes data that no other snapshot is using.



    Before you can take a snapshot, you have to “register” a snapshot repository. A snapshot repository is just a storage location: a shared file system, Amazon S3, Hadoop Distributed File System (HDFS), Azure Storage, etc.

    1. To use a shared file system as a snapshot repository, add it to :

      On the RPM and Debian installs, you can then mount the file system. If you’re using the Docker install, add the file system to each node in docker-compose.yml before starting the cluster:

      1. volumes:
      2. - /Users/jdoe/snapshots:/mnt/snapshots
    2. Then register the repository using the REST API:

      1. PUT _snapshot/my-fs-repository
      2. {
      3. "type": "fs",
      4. "settings": {
      5. "location": "/mnt/snapshots"
      6. }
      7. }

      If the request is successful, the response from OpenSearch is minimal:

      1. {
      2. "acknowledged": true
      3. }

    You probably only need to specify location, but the following table summarizes the options:

    1. To use an Amazon S3 bucket as a snapshot repository, install the repository-s3 plugin on all nodes:

      1. sudo ./bin/opensearch-plugin install repository-s3

      If you’re using the Docker installation, see Customize the Docker image. Your Dockerfile should look something like this:

      1. FROM opensearchproject/opensearch:2.1.0
      2. ENV AWS_ACCESS_KEY_ID <access-key>
      3. ENV AWS_SECRET_ACCESS_KEY <secret-key>
      4. # Optional
      5. ENV AWS_SESSION_TOKEN <optional-session-token>
      6. RUN /usr/share/opensearch/bin/opensearch-plugin install --batch repository-s3
      7. RUN /usr/share/opensearch/bin/opensearch-keystore create
      8. RUN echo $AWS_ACCESS_KEY_ID | /usr/share/opensearch/bin/opensearch-keystore add --stdin s3.client.default.access_key
      9. RUN echo $AWS_SECRET_ACCESS_KEY | /usr/share/opensearch/bin/opensearch-keystore add --stdin s3.client.default.secret_key
      10. # Optional
      11. RUN echo $AWS_SESSION_TOKEN | /usr/share/opensearch/bin/opensearch-keystore add --stdin s3.client.default.session_token

      After the Docker cluster starts, skip to step 7.

    2. Add your AWS access and secret keys to the OpenSearch keystore:

      1. sudo ./bin/opensearch-keystore add s3.client.default.access_key
      2. sudo ./bin/opensearch-keystore add s3.client.default.secret_key
      1. (Optional) If you connect to the internet through a proxy, add those credentials:

      2. (Optional) Add other settings to opensearch.yml:

        1. s3.client.default.disable_chunked_encoding: false # Disables chunked encoding for compatibility with some storage services, but you probably don't need to change this value.
        2. s3.client.default.endpoint: s3.amazonaws.com # S3 has alternate endpoints, but you probably don't need to change this value.
        3. s3.client.default.max_retries: 3 # number of retries if a request fails
        4. s3.client.default.path_style_access: false # whether to use the deprecated path-style bucket URLs.
        5. # You probably don't need to change this value, but for more information, see https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html#path-style-access.
        6. s3.client.default.protocol: https # http or https
        7. s3.client.default.proxy.host: my-proxy-host # the hostname for your proxy server
        8. s3.client.default.proxy.port: 8080 # port for your proxy server
        9. s3.client.default.read_timeout: 50s # the S3 connection timeout
        10. s3.client.default.use_throttle_retries: true # whether the client should wait a progressively longer amount of time (exponential backoff) between each successive retry
        11. s3.client.default.region: us-east-2 # AWS region to use
      3. (Optional) If you don’t want to use AWS access and secret keys, you could configure the S3 plugin to use AWS Identity and Access Management (IAM) roles for service accounts:

        1. sudo ./bin/opensearch-keystore add s3.client.default.role_arn
        2. sudo ./bin/opensearch-keystore add s3.client.default.role_session_name

        If you don’t want to configure AWS access and secret keys, modify the following opensearch.yml setting. Make sure the file is accessible by the repository-s3 plugin:

        1. s3.client.default.identity_token_file: /usr/share/opensearch/plugins/repository-s3/token

        IAM roles require at least one of the above settings. Other settings will be taken from environment variables (if available): , AWS_WEB_IDENTITY_TOKEN_FILE, AWS_ROLE_SESSION_NAME.

      4. If you changed opensearch.yml, you must restart each node in the cluster. Otherwise, you only need to reload secure cluster settings:

        1. POST _nodes/reload_secure_settings
      5. Create an S3 bucket if you don’t already have one. To take snapshots, you need permissions to access the bucket. The following IAM policy is an example of those permissions:

        1. {
        2. "Version": "2012-10-17",
        3. "Statement": [{
        4. "Action": [
        5. "s3:*"
        6. ],
        7. "Effect": "Allow",
        8. "Resource": [
        9. "arn:aws:s3:::your-bucket",
        10. "arn:aws:s3:::your-bucket/*"
        11. ]
        12. }]
        13. }
      6. Register the repository using the REST API:

        1. PUT _snapshot/my-s3-repository
        2. {
        3. "type": "s3",
        4. "settings": {
        5. "bucket": "my-s3-bucket",
        6. "base_path": "my/snapshot/directory"
        7. }
        8. }

      You probably don’t need to specify anything but bucket and base_path, but the following table summarizes the options:

      You specify two pieces of information when you create a snapshot:

      • Name of your snapshot repository

      The following snapshot includes all indices and the cluster state:

      1. PUT _snapshot/my-repository/1

      You can also add a request body to include or exclude certain indices or specify other settings:

      If you request the snapshot immediately after taking it, you might see something like this:

      1. GET _snapshot/my-repository/2
      2. {
      3. "snapshots": [{
      4. "snapshot": "2",
      5. "version": "6.5.4",
      6. "indices": [
      7. "opensearch_dashboards_sample_data_ecommerce",
      8. "my-index",
      9. "opensearch_dashboards_sample_data_logs",
      10. "opensearch_dashboards_sample_data_flights"
      11. ],
      12. "include_global_state": true,
      13. "state": "IN_PROGRESS",
      14. ...
      15. }]
      16. }
      1. PUT _snapshot/my-repository/3?wait_for_completion=true

      Snapshots have the following states:

      You can’t take a snapshot if one is currently in progress. To check the status:

      1. GET _snapshot/_status

      The first step in restoring a snapshot is retrieving existing snapshots. To see all snapshot repositories:

      1. GET _snapshot/_all

      To see all snapshots in a repository:

      1. GET _snapshot/my-repository/_all

      Then restore a snapshot:

      1. POST _snapshot/my-repository/2/_restore

      Just like when taking a snapshot, you can add a request body to include or exclude certain indices or specify some other settings:

      1. POST _snapshot/my-repository/2/_restore
      2. {
      3. "indices": "opensearch-dashboards*,my-index*",
      4. "ignore_unavailable": true,
      5. "include_global_state": false,
      6. "include_aliases": false,
      7. "partial": false,
      8. "rename_pattern": "opensearch-dashboards(.+)",
      9. "rename_replacement": "restored-opensearch-dashboards$1",
      10. "index_settings": {
      11. "index.blocks.read_only": false
      12. },
      13. "ignore_index_settings": [
      14. "index.refresh_interval"
      15. ]
      16. }

      One way to avoid naming conflicts when restoring indices is to use the rename_pattern and rename_replacement options. Then, if necessary, you can use the _reindex API to combine the two. The simpler way is to delete existing indices prior to restoring from a snapshot.

      You can use the _close API to close existing indices prior to restoring from a snapshot, but the index in the snapshot has to have the same number of shards as the existing index.

      We recommend ceasing write requests to a cluster before restoring from a snapshot, which helps avoid scenarios such as:

      1. You delete an index, which also deletes its alias.
      2. A write request to the now-deleted alias creates a new index with the same name as the alias.
      3. The alias from the snapshot fails to restore due to a naming conflict with the new index.

      Snapshots are only forward-compatible by one major version. If you have an old snapshot, you can sometimes restore it into an intermediate cluster, reindex all indices, take a new snapshot, and repeat until you arrive at your desired version, but you might find it easier to just manually index your data on the new cluster.

      If you’re using the security plugin, snapshots have some additional restrictions:

      • To perform snapshot and restore operations, users must have the built-in manage_snapshots role.
      • You can’t restore snapshots that contain global state or the .opendistro_security index.

      If a snapshot contains global state, you must exclude it when performing the restore. If your snapshot also contains the .opendistro_security index, either exclude it or list all the other indices you want to include:

      The .opendistro_security index contains sensitive data, so we recommend excluding it when you take a snapshot. If you do need to restore the index from a snapshot, you must include an admin certificate in the request: