Tutorial: Loading a file
You initiate data loading in Druid by submitting an ingestion task spec to the Druid Overlord. You can write ingestion specs by hand or using the data loader built into the web console.
For production environments, it’s likely that you’ll want to automate data ingestion. This tutorial starts by showing you how to submit an ingestion spec directly in the web console, and then introduces ways to ingest batch data that lend themselves to automation—from the command line and from a script.
The Druid package includes the following sample native batch ingestion task spec at , shown here for convenience, which has been configured to read the input file:
This spec creates a datasource named “wikipedia”.
From the Ingestion view, click the ellipses next to Tasks and choose .
Once the spec is submitted, wait a few moments for the data to load, after which you can query it.
Loading data with a spec (via command line)
For convenience, the Druid package includes a batch ingestion helper script at .
This script will POST an ingestion task to the Druid Overlord and poll Druid until the data is available for querying.
Run the following command from Druid package root:
You should see output like the following:
Let’s briefly discuss how we would’ve submitted the ingestion task without using the script. You do not need to run these commands.
To submit the task, POST it to Druid in a new terminal window from the apache-druid-24.0.2 directory:
Which will print the ID of the task if the submission was successful:
You can monitor the status of this task from the console as outlined above.
Querying your data
Once the data is loaded, please follow the query tutorial to run some example queries on the newly loaded data.
If you wish to go through any of the other ingestion tutorials, you will need to shut down the cluster and reset the cluster state by removing the contents of the directory under the druid package, as the other tutorials will write to the same “wikipedia” datasource.