Quickstart (local)

    In this quickstart, you’ll do the following:

    • install Druid
    • start up Druid services

    Druid supports a variety of ingestion options. Once you’re done with this tutorial, refer to the Ingestion page to determine which ingestion method is right for you.

    You can follow these steps on a relatively modest machine, such as a workstation or virtual server with 16 GiB of RAM.

    Druid comes equipped with several startup configuration profiles for a range of machine sizes. These range from (1 CPU, 4GiB RAM) to x-large (64 CPU, 512GiB RAM). For more information, see . For information on deploying Druid services across clustered machines, see Clustered deployment.

    The software requirements for the installation machine are:

    • Linux, Mac OS X, or other Unix-like OS. (Windows is not supported.)
    • Java 8u92+ or Java 11.

    Before installing a production Druid instance, be sure to review the . In general, avoid running Druid as root user. Consider creating a dedicated user account for running Druid.

    Install Druid

    Download the from Apache Druid. For this quickstart, you need Druid version 24.0 or higher. For versions earlier than 24.0 (0.23 and below), see Load data with native batch ingestion.

    In your terminal, extract the file and change directories to the distribution directory:

    The distribution directory contains LICENSE and NOTICE files and subdirectories for executable files, configuration files, sample data and more.

    Start up Druid services using the single-machine configuration. This configuration includes default settings that are appropriate for this tutorial, such as loading the druid-multi-stage-query extension by default so that you can use the MSQ task engine.

    You can view that setting and others in the configuration files in the conf/druid/single-server/micro-quickstart/.

    From the apache-druid-24.0.2 package root, run the following command:

    This brings up instances of ZooKeeper and the Druid services:

    At any time, you can revert Druid to its original, post-installation state by deleting the entire var directory. You may want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance.

    To stop Druid at any time, use CTRL+C in the terminal. This exits the bin/start-micro-quickstart script and terminates all Druid processes.

    Open the web console

    After the Druid services finish startup, open the at http://localhost:8888.

    It may take a few seconds for all Druid services to finish starting, including the , which serves the console. If you attempt to open the web console before startup is complete, you may see errors in the browser. Wait a few moments and try again.

    In this quickstart, you use the the web console to perform ingestion. The MSQ task engine specifically uses the Query view to edit and run SQL queries. For a complete walkthrough of the Query view as it relates to the multi-stage query architecture and the MSQ task engine, see UI walkthrough.

    The Druid distribution bundles the wikiticker-2015-09-12-sampled.json.gz sample dataset that you can use for testing. The sample dataset is located in the folder, accessible from the Druid root directory, and represents Wikipedia page edits for a given day.

    Follow these steps to load the sample Wikipedia dataset:

    1. In the Query view, click Connect external data.

    2. Select the Local disk tile and enter the following values:

      • Base directory: quickstart/tutorial/

      • File filter: wikiticker-2015-09-12-sampled.json.gz

      Data location

      Entering the base directory and wildcard file filter separately, as afforded by the UI, allows you to specify multiple files for ingestion at once.

    3. On the Parse page, you can examine the raw data and perform the following optional actions before loading data into Druid:

      • Expand a row to see the corresponding source data.
      • Adjust the primary timestamp column for the data. Druid requires data to have a primary timestamp column (internally stored in a column called __time). If your dataset doesn’t have a timestamp, Druid uses the default value of 1970-01-01 00:00:00.

    4. Click Done. You’re returned to the Query view that displays the newly generated query. The query inserts the sample data into the table named wikiticker-2015-09-12-sampled.

      Show the query

    5. Optionally, click Preview to see the general shape of the data before you ingest it.

    6. Edit the first line of the query and change the default destination datasource name from wikiticker-2015-09-12-sampled to wikipedia.

    7. Click Run to execute the query. The task may take a minute or two to complete. When done, the task displays its duration and the number of rows inserted into the table. The view is set to automatically refresh, so you don’t need to refresh the browser to see the status change.

      Run query

      A successful task means that Druid data servers have picked up one or more segments.

    Query data

    Once the ingestion job is complete, you can query the data.

    In the Query view, run the following query to produce a list of top channels:

    Congratulations! You’ve gone from downloading Druid to querying data with the MSQ task engine in just one quickstart.

    See the following topics for more information:

    • or the Query tutorial to learn about how to query the data you just ingested.
    • to explore options for ingesting more data.
    • Tutorial: Load files using SQL to learn how to generate a SQL query that loads external data into a Druid datasource.
    • to load and query data with Druid’s native batch ingestion feature.
    • Extensions for details on Druid extensions.