Ingestion troubleshooting FAQ
Druid ingested my events but they are not in my query results
If the number of ingested events seem correct, make sure your query is correctly formed. If you included a aggregator in your ingestion spec, you will need to query for the results of this aggregate with a longSum
aggregator. Issuing a query with a count aggregator will count the number of Druid rows, which includes roll-up.
Where do my Druid segments end up after ingestion?
Depending on what druid.storage.type
is set to, Druid will upload segments to some Deep Storage. Local disk is used as the default deep storage.
First, make sure there are no exceptions in the logs of the ingestion process. Also make sure that druid.storage.type
is set to a deep storage that isn’t local
if you are running a distributed cluster.
Other common reasons that hand-off fails are as follows:
Segments are corrupt and cannot be downloaded. You’ll see exceptions in your Historical processes if this occurs.
Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the Coordinator logs have no errors.
How do I get HDFS to work?
Make sure to include the and all the hadoop configuration, dependencies (that can be obtained by running command hadoop classpath
on a machine where hadoop has been setup) in the classpath. And, provide necessary HDFS settings as described in .
How do I know when I can make query to Druid after submitting batch ingestion task?
- Submit your ingestion task.
- Repeatedly poll the (
/druid/indexer/v1/task/{taskId}/status
) until your task is shown to be successfully completed. - Poll the Segment Loading by Datasource API (
/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus
) withforceMetadataRefresh=true
and once. (Note:forceMetadataRefresh=true
refreshes Coordinator’s metadata cache of all datasources. This can be a heavy operation in terms of the load on the metadata store but is necessary to make sure that we verify all the latest segments’ load status) If there are segments not yet loaded, continue to step 4, otherwise you can now query the data.
You can check the web console to make sure that your segments have actually loaded on . If your segments are not present, check the Coordinator logs for messages about capacity of replication errors. One reason that segments are not downloaded is because Historical processes have maxSizes that are too small, making them incapable of downloading more data. You can change that with (for example):
My queries are returning empty results
You can use a for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists.
Real-time ingestion seems to be stuck
There are a few ways this can occur. Druid will throttle ingestion to prevent out of memory problems if the intermediate persists are taking too long or if hand-off is taking too long. If your process logs indicate certain columns are taking a very long time to build (for example, if your segment granularity is hourly, but creating a single column takes 30 minutes), you should re-evaluate your configuration or scale up your real-time ingestion.
Data ingestion for Druid can be difficult for first time users. Please don’t hesitate to ask questions in the Druid Forum.