Native batch simple task indexing
The simple task (task type index
) executes single-threaded as a single task within the indexing service. For parallel, scalable options consider using or SQL-based batch ingestion.
A sample task is shown below:
See the dataSchema section of the ingestion docs for details.
If you do not specify intervals
explicitly in your dataSchema’s granularitySpec, the Local Index Task will do an extra pass over the data to determine the range to lock when it starts up. If you specify intervals
explicitly, any rows outside the specified intervals will be thrown away. We recommend setting intervals
explicitly if you know the time range of the data because it allows the task to skip the extra pass, and so that you don’t accidentally replace data outside that range if there’s some stray data with unexpected timestamps.
PartitionsSpec is to describe the secondary partitioning method. You should use different partitionsSpec depending on the rollup mode you want. For perfect rollup, you should use hashed
.
For best-effort rollup, you should use dynamic
.
- Bulk pushing mode: Used for perfect rollup. Druid pushes every segment at the very end of the index task. Until then, Druid stores created segments in memory and local storage of the service running the index task. This mode can cause problems if you have limited storage capacity, and is not recommended to use in production. To enable bulk pushing mode, set
forceGuaranteedRollup
in your TuningConfig. You can not use bulk pushing with in your IOConfig.