Use Azure BlobStore offloader with Pulsar

Follow the steps below to install the Azure BlobStore offloader.

  • Pulsar: 2.6.2 or later versions

This example uses Pulsar 2.6.2.

  1. Download the Pulsar tarball using one of the following ways:

  1. Download and untar the Pulsar offloaders package.

    1. wget https://downloads.apache.org/pulsar/pulsar-2.6.2/apache-pulsar-offloaders-2.6.2-bin.tar.gz
    2. tar xvfz apache-pulsar-offloaders-2.6.2-bin.tar.gz
  1. Copy the Pulsar offloaders as offloaders in the Pulsar directory.

    1. mv apache-pulsar-offloaders-2.6.2/offloaders apache-pulsar-2.6.2/offloaders
    2. ls offloaders

Configuration

note

Before offloading data from BookKeeper to Azure BlobStore, you need to configure some properties of the Azure BlobStore offload driver.

You can configure the Azure BlobStore offloader driver in the configuration file broker.conf or standalone.conf.

  • Required configurations are as below.

  • Optional configurations are as below.

Bucket (required)

A bucket is a basic container that holds your data. Everything you store in Azure BlobStore must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket.

Example

This example names the bucket as pulsar-topic-offload.

  1. managedLedgerOffloadBucket=pulsar-topic-offload

Authentication (required)

To be able to access Azure BlobStore, you need to authenticate with Azure BlobStore.

  • Set the environment variables AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_ACCESS_KEY in conf/pulsar_env.sh.

    “export” is important so that the variables are made available in the environment of spawned processes.

    1. export AZURE_STORAGE_ACCOUNT=ABC123456789

Size of block read/write

You can configure the size of a request sent to or read from Azure BlobStore in the configuration file broker.conf or standalone.conf.

Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offloader does not work until the current segment is full.

You can configure the threshold size using CLI tools, such as pulsar-admin.

The offload configurations in broker.conf and standalone.conf are used for the namespaces that do not have namespace level offload policies. Each namespace can have its own offload policy. If you want to set offload policy for each namespace, use the command command.

Example

This example sets the Azure BlobStore offloader threshold size to 10 MB using pulsar-admin.

tip

For more information about the pulsar-admin namespaces set-offload-threshold options command, including flags, descriptions, and default values, see here.

For individual topics, you can trigger Azure BlobStore offloader manually using one of the following methods:

  • Use REST endpoint.

  • Use CLI tools (such as pulsar-admin).

    To trigger it via CLI tools, you need to specify the maximum amount of data (threshold) that should be retained on a Pulsar cluster for a topic. If the size of the topic data on the Pulsar cluster exceeds this threshold, segments from the topic are moved to Azure BlobStore until the threshold is no longer exceeded. Older segments are moved first.

Example

    1. bin/pulsar-admin topics offload --size-threshold 10M my-tenant/my-namespace/topic1
  1. ```
  2. Offload triggered for persistent://my-tenant/my-namespace/topic1 for messages before 2:0:-1
  3. ```
  4. ##### tip
  5. For more information about the `pulsar-admin topics offload options` command, including flags, descriptions, and default values, see [here](https://pulsar.apache.org/tools/pulsar-admin/2.6.0-SNAPSHOT/#-em-offload-em-).
  • This example checks the Azure BlobStore offloader status using pulsar-admin.

  1. **Output**
  2. ```
  3. Offload is currently running
  4. ```
  5. To wait for the Azure BlobStore offloader to complete the job, add the `-w` flag.
  6. ```
  7. bin/pulsar-admin topics offload-status -w persistent://my-tenant/my-namespace/topic1
  8. ```
  9. **Output**
  10. ```
  11. Offload was a success
  12. ```
  13. If there is an error in offloading, the error is propagated to the `pulsar-admin topics offload-status` command.
  14. ```
  15. bin/pulsar-admin topics offload-status persistent://my-tenant/my-namespace/topic1
  16. ```
  17. **Output**
  18. ```
  19. Error in offload
  20. null
  21. Reason: Error offloading: org.apache.bookkeeper.mledger.ManagedLedgerException:
  22. ```
  23. ##### tip