S3-compatible

    • from files stored in S3.
    • Write segments to deep storage in S3.

    To use this Apache Druid extension, in the extensions load list.

    Use a native batch with an S3 input source to read objects directly from S3.

    Alternatively, use a , and specify S3 paths in your inputSpec.

    To read objects from S3, you must supply in configuration.

    Deep Storage

    S3-compatible deep storage means either AWS S3 or a compatible service like Google Storage which exposes the same API as S3.

    S3 deep storage needs to be explicitly enabled by setting druid.storage.type=s3. Only after setting the storage type to S3 will any of the settings below take effect.

    Deep storage specific configuration

    You can provide credentials to connect to S3 in a number of ways, whether for deep storage or as an .

    The configuration options are listed in order of precedence. For example, if you would like to use profile information given in ~/.aws.credentials, do not set druid.s3.accessKey and druid.s3.secretKey in your Druid config file because they would take precedence.

    For more information, refer to the Amazon Developer Guide.

    Alternatively, you can bypass this chain by specifying an access key and secret key using a inside your ingestion specification.

    Use the property druid.startup.logging.maskProperties to mask credentials information in Druid logs. For example, ["password", "secretKey", "awsSecretAccessKey"].

    S3 permissions settings

    To manage the permissions for objects in an S3 bucket, you can use either ACLs or Object Ownership. The permissions required for each method are different.

    You can switch from using ACLs to Object Ownership by setting druid.storage.disableAcl to true. The bucket owner owns any object that gets created, so you need to use S3’s bucket policies to manage permissions.

    Note that this setting only affects Druid’s behavior. Changing S3 to use Object Ownership requires additional configuration. For more information, see the AWS documentation on Controlling ownership of objects and disabling ACLs for your bucket.

    ACL permissions

    If you’re using ACLs, Druid needs the following permissions:

    • s3:GetObject
    • s3:PutObject
    • s3:DeleteObject
    • s3:PutObjectAcl

    Object Ownership permissions

    If you’re using Object Ownership, Druid needs the following permissions:

    • s3:GetObject
    • s3:PutObject
    • s3:DeleteObject

    The AWS SDK requires that a target region be specified. You can set these by using the JVM system property aws.region or by setting an environment variable AWS_REGION.

    For example, to set the region to ‘us-east-1’ through system properties:

    • Add -Daws.region=us-east-1 to the jvm.config file for all Druid services.
    • Add -Daws.region=us-east-1 to druid.indexer.runner.javaOpts in Middle Manager configuration so that the property will be passed to Peon (worker) processes.

    Connecting to S3 configuration