Automated cleanup for metadata records

    • Segment records
    • Audit records
    • Supervisor records
    • Rule records
    • Compaction configuration records
    • Datasource records created by supervisors
    • Indexer task logs

    When you delete some entities from Apache Druid, records related to the entity may remain in the metadata store. If you have a high datasource churn rate, meaning you frequently create and delete many short-lived datasources or other related entities like compaction configuration or rules, the leftover records can fill your metadata store and cause performance issues. To maintain metadata store performance, you can configure Apache Druid to automatically remove records associated with deleted entities from the metadata store.

    By default, Druid automatically cleans up metadata older than 90 days. This applies to all metadata entities in this topic except compaction configuration records and indexer task logs, for which cleanup is disabled by default. You can configure the retention period for each metadata type, when available, through the record’s property. Certain records may require additional conditions be satisfied before clean up occurs.

    See the example for how you can customize the automated metadata cleanup for a specific use case.

    There are several cases when you should consider automated cleanup of the metadata related to deleted datasources:

    • If you know you have many high-churn datasources, for example, you have scripts that create and delete supervisors regularly.
    • If you have issues with the hard disk for your metadata database filling up.
    • If you run into performance issues with the metadata database. For example, API calls are very slow or fail to execute.

    If you have compliance requirements to keep audit records and you enable automated cleanup for audit records, use alternative methods to preserve audit metadata, for example, by periodically exporting audit metadata records to external storage.

    You can configure cleanup for each entity separately, as described in this section. Define the properties in the coordinator/runtime.properties file.

    The cleanup of one entity may depend on the cleanup of another entity as follows:

    • You have to configure a before you can configure automated cleanup for rules or .
    • You have to schedule the metadata management tasks to run at the same or higher frequency as your most frequent cleanup job. For example, if your most frequent cleanup job is every hour, set the metadata store management period to one hour or less: druid.coordinator.period.metadataStoreManagementPeriod=P1H.

    For details on configuration properties, see Metadata management. If you want to skip the details, check out the for configuring automated metadata cleanup.

    • When they meet the eligibility requirement of kill task datasource configuration according to killDataSourceWhitelist set in the Coordinator dynamic configuration. See .
    • When the durationToRetain time has passed since their creation.

    Kill tasks use the following configuration:

    • druid.coordinator.kill.on: When true, enables the Coordinator to submit a kill task for unused segments, which deletes them completely from metadata store and from deep storage. Only applies to the specified datasources in the dynamic configuration parameter killDataSourceWhitelist. If killDataSourceWhitelist is not set or empty, then kill tasks can be submitted for all datasources.
    • druid.coordinator.kill.period: Defines the frequency in ISO 8601 format for the cleanup job to check for and delete eligible segments. Defaults to P1D. Must be greater than druid.coordinator.period.indexingPeriod.
    • druid.coordinator.kill.durationToRetain: Defines the retention period in after creation that segments become eligible for deletion.

    Audit records

    All audit records become eligible for deletion when the durationToRetain time has passed since their creation.

    Audit cleanup uses the following configuration:

    • druid.coordinator.kill.audit.on: When true, enables cleanup for audit records.
    • druid.coordinator.kill.audit.period: Defines the frequency in for the cleanup job to check for and delete eligible audit records. Defaults to P1D.
    • druid.coordinator.kill.audit.durationToRetain: Defines the retention period in ISO 8601 format after creation that audit records become eligible for deletion.

    Supervisor records become eligible for deletion when the supervisor is terminated and the durationToRetain time has passed since their creation.

    Supervisor cleanup uses the following configuration:

    • druid.coordinator.kill.supervisor.on: When true, enables cleanup for supervisor records.
    • druid.coordinator.kill.supervisor.period: Defines the frequency in ISO 8601 format for the cleanup job to check for and delete eligible supervisor records. Defaults to P1D.
    • druid.coordinator.kill.supervisor.durationToRetain: Defines the retention period in after creation that supervisor records become eligible for deletion.

    Rules records

    Rule records become eligible for deletion when all segments for the datasource have been killed by the kill task and the durationToRetain time has passed since their creation. Automated cleanup for rules requires a .

    Rule cleanup uses the following configuration:

    • druid.coordinator.kill.rule.on: When true, enables cleanup for rules records.
    • druid.coordinator.kill.rule.period: Defines the frequency in ISO 8601 format for the cleanup job to check for and delete eligible rules records. Defaults to P1D.
    • : Defines the retention period in after creation that rules records become eligible for deletion.

    Druid retains all compaction configuration records by default, which should be suitable for most use cases. If you create and delete short-lived datasources with high frequency, and you set auto compaction configuration on those datasources, then consider turning on automated cleanup of compaction configuration records.

    Compaction configuration records in the druid_config table become eligible for deletion after all segments for the datasource have been killed by the kill task. Automated cleanup for compaction configuration requires a .

    Compaction configuration cleanup uses the following configuration:

    • druid.coordinator.kill.compaction.on: When true, enables cleanup for compaction configuration records.
    • druid.coordinator.kill.compaction.period: Defines the frequency in ISO 8601 format for the cleanup job to check for and delete eligible compaction configuration records. Defaults to P1D.

    Datasource records created by supervisors

    Datasource records created by supervisors become eligible for deletion when the supervisor is terminated or does not exist in the druid_supervisors table and the durationToRetain time has passed since their creation.

    Datasource cleanup uses the following configuration:

    • druid.coordinator.kill.datasource.on: When true, enables cleanup datasources created by supervisors.
    • druid.coordinator.kill.datasource.period: Defines the frequency in ISO 8601 format for the cleanup job to check for and delete eligible datasource records. Defaults to P1D.
    • druid.coordinator.kill.datasource.durationToRetain: Defines the retention period in after creation that datasource records become eligible for deletion.

    You can configure the Overlord to delete indexer task log metadata and the indexer task logs from local disk or from cloud storage. Set these properties in the overlord/runtime.properties file.

    Indexer task log cleanup on the Overlord uses the following configuration:

    • druid.indexer.logs.kill.enabled: When true, enables cleanup of task logs.
    • druid.indexer.logs.kill.durationToRetain: Defines the length of time in milliseconds to retain task logs.
    • druid.indexer.logs.kill.initialDelay: Defines the length of time in milliseconds after the Overlord starts before it executes its first job to kill task logs.
    • druid.indexer.logs.kill.delay: The length of time in milliseconds between jobs to kill task logs.

    For more detail, see .

    Druid automatically cleans up metadata records, excluding compaction configuration records and indexer task logs. To disable automated metadata cleanup, set the following properties in the coordinator/runtime.properties file:

    1. ...
    2. # Schedule the metadata management store task for every hour:
    3. druid.coordinator.period.metadataStoreManagementPeriod=P1H
    4. # Set a kill task to poll every day to delete Segment records and segments
    5. # in deep storage > 4 days old. When druid.coordinator.kill.on is set to true,
    6. # you can set killDataSourceWhitelist in the dynamic configuration to limit
    7. # the datasources that can be killed.
    8. druid.coordinator.kill.on=true
    9. druid.coordinator.kill.durationToRetain=P4D
    10. druid.coordinator.kill.maxSegments=1000
    11. # Poll every day to delete audit records > 30 days old
    12. druid.coordinator.kill.audit.on=true
    13. druid.coordinator.kill.audit.period=P1D
    14. druid.coordinator.kill.audit.durationToRetain=P30D
    15. # Poll every day to delete supervisor records > 4 days old
    16. druid.coordinator.kill.supervisor.on=true
    17. druid.coordinator.kill.supervisor.period=P1D
    18. druid.coordinator.kill.supervisor.durationToRetain=P4D
    19. # Poll every day to delete rules records > 4 days old
    20. druid.coordinator.kill.rule.on=true
    21. druid.coordinator.kill.rule.period=P1D
    22. druid.coordinator.kill.rule.durationToRetain=P4D
    23. # Poll every day to delete compaction configuration records
    24. druid.coordinator.kill.compaction.on=true
    25. druid.coordinator.kill.compaction.period=P1D
    26. # Poll every day to delete datasource records created by supervisors > 4 days old
    27. druid.coordinator.kill.datasource.on=true
    28. druid.coordinator.kill.datasource.period=P1D
    29. druid.coordinator.kill.datasource.durationToRetain=P4D
    30. ...

    See the following topics for more information:

    • for metadata store configuration reference.
    • Metadata storage for an overview of the metadata storage database.