- Write segments to deep storage in S3.
To use this Apache Druid extension, in the extensions load list.
The S3 input source is supported by the to read objects directly from S3. If you use the Hadoop task, you can read data from S3 by specifying the S3 paths in your .
To configure the extension to read objects from S3 you need to configure how to connect to S3.
Deep Storage
S3 deep storage needs to be explicitly enabled by setting druid.storage.type=s3
. Only after setting the storage type to S3 will any of the settings below take effect.
To correctly configure this extension for deep storage in S3, first configure how to connect to S3. In addition to this you need to set additional configuration, specific for
Deep storage specific configuration
Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket). Note : You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using parameters in the ingestionSpec.
S3 permissions settings
s3:GetObject
and s3:PutObject
are basically required for pushing/loading segments to/from S3. If druid.storage.disableAcl
is set to false
, then s3:GetBucketAcl
and s3:PutObjectAcl
are additionally required to set ACL for objects.
The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property aws.region
or the environment variable AWS_REGION
.
As an example, to set the region to ‘us-east-1’ through system properties:
- Add
-Daws.region=us-east-1
to the jvm.config file for all Druid services. - Add
-Daws.region=us-east-1
todruid.indexer.runner.javaOpts
in so that the property will be passed to Peon (worker) processes.