Before beginning the quickstart, it is helpful to read the and the ingestion overview, as the tutorials will refer to concepts discussed on those pages. Additionally, familiarity with is recommended.
This tutorial assumes you will download the required files from GitHub. The files are also available in a Druid installation and in the Druid sources.
Create a directory to hold the Druid Docker files.
The Druid source code contains which pulls an image from Docker Hub and is suited to be used as an example environment and to experiment with Docker based Druid configuration and deployments. Download this file to the directory created above.
The example will create a container for each Druid service, as well as ZooKeeper and a PostgreSQL container as the metadata store.
It will also create a named volume druid_shared
as deep storage to keep and share segments and task logs among Druid services. The volume is mounted as opt/shared
in the container.
The Druid docker-compose.yml
example uses an to specify the complete Druid configuration, including the environment variables described in Configuration. This file is named environment
by default, and must be in the same directory as the docker-compose.yml
file. the example environment
file to the directory created above. The options in this file work well for trying Druid and for using the tutorial.
The single-file approach is inadequate for a production system. Instead we suggest using either DRUID_COMMON_CONFIG
and DRUID_CONFIG_${service}
or specially tailored, service-specific environment files.
Configuration of the Druid Docker container is done via environment variables set within the container. Docker Compose passes the values from the environment file
into the container. The variables may additionally specify paths to the standard Druid configuration files which must be available within the container.
Basic configuration:
DRUID_MAXDIRECTMEMORYSIZE
— set Java max direct memory size. Default is 6 GiB.DRUID_XMX
— set Java , the maximum heap size. Default is 1 GB.
Production configuration:
DRUID_CONFIG_COMMON
— full path to a file for Druid common propertiesJAVA_OPTS
— set Java options
Logging configuration:
DRUID_LOG4J
— set the entire verbatim. (Example)DRUID_LOG_LEVEL
— override the default
Advanced memory configuration:
DRUID_XMS
— set Java Xms, the initial heap size. Default is 1 GB.DRUID_MAXNEWSIZE
— set
In addition to the special environment variables, the script which launches Druid in the container will use any environment variable starting with the druid_
prefix as command-line configuration. For example, an environment variable
druid_metadata_storage_type=postgresql
is translated into the following option in the Java launch command for the Druid process in the container:
-Ddruid.metadata.storage.type=postgresql
Note that Druid uses port 8888 for the console. This port is also used by Jupyter and other tools. To avoid conflicts, you can change the port in the ports section of the file. For example, to expose the console on port 9999 of the host:
Run docker-compose up
to launch the cluster with a shell attached, or docker-compose up -d
to run the cluster in the background.
Once the cluster has started, you can navigate to the at http://localhost:8888. The serves the UI.
It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
From here you can follow along with the Quickstart. For production use, refine your docker-compose.yml
file to add any additional external service dependencies as necessary.
You can explore the Druid containers using Docker to start a shell:
docker exec -ti <id> sh
Where <id>
is the container id found with docker ps
. Druid is installed in /opt/druid
. The which consumes the environment variables mentioned above, and which launches Druid, is located at /druid.sh
.
Run docker-compose down
to shut down the cluster. Your data is persisted as a set of Docker volumes and will be available when you restart your Druid cluster.
The default launches eight containers: Zookeeper, PostgreSQL, and six Druid containers. Each Druid service is configured to use up to 7 GB of memory (6 GB direct memory and 1 GB heap). However, the Quickstart will not use all the available memory.