To use this Apache Druid extension, in the extensions load list.

The S3 input source is supported by the to read objects directly from S3. If you use the Hadoop task, you can read data from S3 by specifying the S3 paths in your .

To configure the extension to read objects from S3 you need to configure how to connect to S3.

Deep Storage

S3 deep storage needs to be explicitly enabled by setting druid.storage.type=s3. Only after setting the storage type to S3 will any of the settings below take effect.

To correctly configure this extension for deep storage in S3, first configure how to connect to S3. In addition to this you need to set additional configuration, specific for

Deep storage specific configuration

Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket). Note : You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using parameters in the ingestionSpec.

S3 permissions settings

s3:GetObject and s3:PutObject are basically required for pushing/loading segments to/from S3. If druid.storage.disableAcl is set to false, then s3:GetBucketAcl and s3:PutObjectAcl are additionally required to set ACL for objects.

The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property aws.region or the environment variable AWS_REGION.

As an example, to set the region to ‘us-east-1’ through system properties:

  • Add -Daws.region=us-east-1 to the jvm.config file for all Druid services.
  • Add -Daws.region=us-east-1 to druid.indexer.runner.javaOpts in so that the property will be passed to Peon (worker) processes.

Connecting to S3 configuration