Data deletion

    Deletion by time range happens in two steps:

    1. Once a segment is marked “unused”, you can use a to permanently delete the segment file from deep storage and remove its record from the metadata store. This is a hard delete: the data is unrecoverable unless you have a backup.

    For documentation on disabling segments using the Coordinator API, see the Coordinator API reference.

    A data deletion tutorial is available at .

    Druid supports , which are used to define intervals of time where data should be preserved, and intervals where data should be discarded. Data that falls under a drop rule is marked unused, in the same manner as if you manually mark that time range unused. This is a fast, metadata-only operation.

    Druid supports deleting specific records using reindexing with a filter. The filter specifies which data remains after reindexing, so it must be the inverse of the data you want to delete. Because segments must be rewritten to delete data in this way, it can be a time-consuming operation.

    For example, to delete records where userName is 'bob' with native batch indexing, use a with filter .

    To delete the same records using SQL, use REPLACE with WHERE userName <> 'bob'.

    To reindex using , use the druid input source. If needed, can be used to filter or modify data during the reindexing job. To reindex with SQL, use REPLACE

    with SELECT ... FROM <table>. (Druid does not have or ALTER TABLE statements.) Any SQL SELECT query can be used to filter, modify, or enrich the data during the reindexing job.

    Data that is deleted in this way is marked unused, but remains in deep storage. To permanently delete it, use a kill task.

    Deleting an entire table works the same way as deleting part of a table by time range. First, mark all segments unused using the Coordinator API or web console. Then, optionally, delete it permanently using a .

    Data that has been overwritten or soft-deleted still remains as segments that have been marked unused. You can use a task to permanently delete this data.

    The available grammar is: