Native batch simple task indexing

    A sample task is shown below:

    This field is required.

    If you do not specify intervals explicitly in your dataSchema’s granularitySpec, the Local Index Task will do an extra pass over the data to determine the range to lock when it starts up. If you specify intervals explicitly, any rows outside the specified intervals will be thrown away. We recommend setting intervals explicitly if you know the time range of the data because it allows the task to skip the extra pass, and so that you don’t accidentally replace data outside that range if there’s some stray data with unexpected timestamps.

    propertydescriptiondefaultrequired?
    typeThe task type, this should always be “index”.noneyes
    inputFormatinputFormat to specify how to parse input data.noneyes
    appendToExistingCreates segments as additional shards of the latest version, effectively appending to the segment set instead of replacing it. This means that you can append new segments to any datasource regardless of its original partitioning scheme. You must use the dynamic partitioning type for the appended segments. If you specify a different partitioning type, the task fails with an error.falseno
    dropExistingIf this setting is false then ingestion proceeds as usual. Set this to true and appendToExisting to false to enforce true “replace” functionality as described next. If true and appendToExisting is false and the granularitySpec contains at least oneinterval, then the ingestion task will create regular segments for time chunk intervals with input data and tombstones for all other time chunks with no data. The task will publish the data segments and the tombstone segments together when the it publishes new segments. The net effect of the data segments and the tombstones is to completely adhere to a “replace” semantics where the input data contained in the granularitySpec intervals replaces all existing data in the intervals even for time chunks that would be empty in the case that no input data was associated with them. In the extreme case when the input data set that falls in the intervals is empty all existing data in the interval will be replaced with an empty data set (i.e. with nothing — all existing data will be covered by tombstones). If ingestion fails, no segments and tombstones will be published. The following two combinations are not supported and will make the ingestion fail with an error: dropExisting is true and interval is not specified in granularitySpec or appendToExisting is true and dropExisting is true. WARNING: this functionality is still in beta and even though we are not aware of any bugs, use with caution.falseno

    The tuningConfig is optional and default parameters will be used if no tuningConfig is specified. See below for more details.

    propertydescriptiondefaultrequired?
    typeThis should always be hashednoneyes
    maxRowsPerSegmentUsed in sharding. Determines how many rows are in each segment.5000000no
    numShardsDirectly specify the number of shards to create. If this is specified and intervals is specified in the granularitySpec, the index task can skip the determine intervals/partitions pass through the data. numShards cannot be specified if maxRowsPerSegment is set.nullno
    partitionDimensionsThe dimensions to partition on. Leave blank to select all dimensions.nullno
    partitionFunctionA function to compute hash of partition dimensions. See murmur3_32_absno

    For best-effort rollup, you should use dynamic.

    FieldTypeDescriptionRequired
    typeStringSee for explanation and available options.yes

    While ingesting data using the simple task indexing, Druid creates segments from the input data and pushes them. For segment pushing, the simple task index supports the following segment pushing modes based upon your type of :

    • Incremental pushing mode: Used for best-effort rollup. Druid pushes segments are incrementally during the course of the indexing task. The index task collects data and stores created segments in the memory and disks of the services running the task until the total number of collected rows exceeds . At that point the index task immediately pushes all segments created up until that moment, cleans up pushed segments, and continues to ingest the remaining data.