Configuring Apache Druid to use Kerberized Apache Hadoop as deep storage

    1. For HDFS as a deep storage, hdfs-site.xml, core-site.xml
    2. For ingestion, mapred-site.xml, yarn-site.xml
    1. Create the folder in hdfs under the required parent folder. For example, OR hdfs dfs -mkdir /apps/druid

    2. Give druid processes appropriate permissions for the druid processes to access this folder. This would ensure that druid is able to create necessary folders like data and indexing_log in HDFS. For example, if druid processes run as user ‘root’, then

      hdfs dfs -chown root:root /apps/druid

      OR

    Druid creates necessary sub-folders to store data and index under this newly created folder.

    Druid Setup

    Edit common.runtime.properties at conf/druid/_common/common.runtime.properties to include the HDFS properties. Folders used for the location are same as the ones used for example above.

    Note: Comment out Local storage and S3 Storage parameters in the file

    Also include hdfs-storage core extension to conf/druid/_common/common.runtime.properties

    Ensure that Druid has necessary jars to support the Hadoop version.

    In case there is other software used with hadoop, like , ensure that

    1. the necessary libraries are available
    2. add the requisite extensions to druid.extensions.loadlist in conf/druid/_common/common.runtime.properties

    Create a headless keytab which would have access to the druid data and index.

    Edit conf/druid/_common/common.runtime.properties and add the following properties:

    For example

    With the above changes, restart Druid. This would ensure that Druid works with Kerberized Hadoop