Tutorial: Deleting data

    For this tutorial, we’ll assume you’ve already downloaded Apache Druid as described in the single-machine quickstart and have it running on your local machine.

    In this tutorial, we will use the Wikipedia edits data, with an indexing spec that creates hourly segments. This spec is located at , and it creates a datasource called deletion-tutorial.

    Let’s load this initial data:

    When the load finishes, open http://localhost:8888/unified-console.md#datasources in a browser.

    Permanent deletion of a Druid segment has two steps:

    1. The segment must first be marked as “unused”. This occurs when a user manually disables a segment through the Coordinator API.
    2. After segments have been marked as “unused”, a Kill Task will delete any “unused” segments from Druid’s metadata store as well as deep storage.

    Let’s drop some segments now, by using the coordinator API to drop data by interval and segmentIds.

    1. curl -X 'POST' -H 'Content-Type:application/json' -d '{ "interval" : "2015-09-12T18:00:00.000Z/2015-09-12T20:00:00.000Z" }' http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/markUnused

    After that command completes, you should see that the segment for hour 18 and 19 have been disabled:

    Note that the hour 18 and 19 segments are still present in deep storage:

    Let’s disable some segments by their segmentID. This will again mark the segments as “unused”, but not remove them from deep storage. You can see the full segmentID for a segment from UI as explained below.

    In the segments view, click the arrow on the left side of one of the remaining segments to expand the segment entry:

    Segments

    Let’s disable the hour 13 and 14 segments by sending a POST request to the Coordinator with this payload

    1. {
    2. "segmentIds":
    3. [
    4. "deletion-tutorial_2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z_2019-05-01T17:38:46.961Z",
    5. "deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-05-01T17:38:46.961Z"
    6. ]
    7. }

    This payload json has been provided at quickstart/tutorial/deletion-disable-segments.json. Submit the POST request to Coordinator like this:

    After that command completes, you should see that the segments for hour 13 and 14 have been disabled:

    Note that the hour 13 and 14 segments are still in deep storage:

    1. 2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z
    2. 2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z
    3. 2015-09-12T03:00:00.000Z_2015-09-12T04:00:00.000Z
    4. 2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z
    5. 2015-09-12T05:00:00.000Z_2015-09-12T06:00:00.000Z
    6. 2015-09-12T06:00:00.000Z_2015-09-12T07:00:00.000Z
    7. 2015-09-12T07:00:00.000Z_2015-09-12T08:00:00.000Z
    8. 2015-09-12T08:00:00.000Z_2015-09-12T09:00:00.000Z
    9. 2015-09-12T09:00:00.000Z_2015-09-12T10:00:00.000Z
    10. 2015-09-12T10:00:00.000Z_2015-09-12T11:00:00.000Z
    11. 2015-09-12T11:00:00.000Z_2015-09-12T12:00:00.000Z
    12. 2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
    13. 2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z
    14. 2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z
    15. 2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z
    16. 2015-09-12T17:00:00.000Z_2015-09-12T18:00:00.000Z
    17. 2015-09-12T19:00:00.000Z_2015-09-12T20:00:00.000Z
    18. 2015-09-12T20:00:00.000Z_2015-09-12T21:00:00.000Z
    19. 2015-09-12T21:00:00.000Z_2015-09-12T22:00:00.000Z
    20. 2015-09-12T22:00:00.000Z_2015-09-12T23:00:00.000Z
    21. 2015-09-12T23:00:00.000Z_2015-09-13T00:00:00.000Z

    Now that we have disabled some segments, we can submit a Kill Task, which will delete the disabled segments from metadata and deep storage.

    After this task completes, you can see that the disabled segments have now been removed from deep storage:

    1. $ ls -l1 var/druid/segments/deletion-tutorial/
    2. 2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
    3. 2015-09-12T15:00:00.000Z_2015-09-12T16:00:00.000Z
    4. 2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z
    5. 2015-09-12T17:00:00.000Z_2015-09-12T18:00:00.000Z
    6. 2015-09-12T20:00:00.000Z_2015-09-12T21:00:00.000Z
    7. 2015-09-12T21:00:00.000Z_2015-09-12T22:00:00.000Z
    8. 2015-09-12T23:00:00.000Z_2015-09-13T00:00:00.000Z