- segments records
- audit records
- supervisor records
- rule records
- compaction configuration records
If you have a high datasource churn rate, meaning you frequently create and delete many short-lived datasources or other related entities like compaction configuration or rules, the leftover records can start to fill your metadata store and cause performance issues.
To maintain metadata store performance in this case, you can configure Apache Druid to automatically remove records associated with deleted entities from the metadata store.
There are several cases when you should consider automated cleanup of the metadata related to deleted datasources:
- If you know you have many high-churn datasources, for example, you have scripts that create and delete supervisors regularly.
- If you have issues with the hard disk for your metadata database filling up.
- If you run into performance issues with the metadata database. For example, API calls are very slow or fail to execute.
If you have compliance requirements to keep audit records and you enable automated cleanup for audit records, use alternative methods to preserve audit metadata, for example, by periodically exporting audit metadata records to external storage.
By default, automatic cleanup for metadata is disabled. See for the default configuration settings after you enable the feature.
You can configure cleanup on a per-entity basis with the following constraints:
- You have to configure a kill task for segment records before you can configure automated cleanup for or compaction configuration.
- You have to configure the scheduler for the cleanup jobs to run at the same frequency or more frequently than your most frequent cleanup job. For example, if your most frequent cleanup job is every hour, set the scheduler metadata store management period to one hour or less: .
For details on configuration properties, see .
Segment records and segments in deep storage become eligible for deletion:
- When they meet the eligibility requirement of kill task datasource configuration according to
killDataSourceWhitelist
andkillAllDataSources
set in the Coordinator dynamic configuration. See Dynamic configuration. - The
durationToRetain
time has passed since their creation.
druid.coordinator.kill.on
: WhenTrue
, enables the Coordinator to submit kill task for unused segments, which deletes them completely from metadata store and from deep storage. Only appliesdataSources
according to the dynamic configuration: allowed datasources (killDataSourceWhitelist
) or all datasources (killAllDataSources
).druid.coordinator.kill.period
: Defines the frequency in for the cleanup job to check for and delete eligible segments. Defaults toP1D
. Must be greater thandruid.coordinator.period.indexingPeriod
.druid.coordinator.kill.durationToRetain
: Defines the retention period in ISO 8601 format after creation that segments become eligible for deletion.druid.coordinator.kill.maxSegments
: Defines the maximum number of segments to delete per kill task.
Audit records
All audit records become eligible for deletion when the durationToRetain
time has passed since their creation.
Audit cleanup uses the following configuration:
druid.coordinator.kill.audit.on
: Whentrue
, enables cleanup for audit records.- : Defines the frequency in ISO 8601 format for the cleanup job to check for and delete eligible audit records. Defaults to
P1D
. druid.coordinator.kill.audit.durationToRetain
: Defines the retention period in after creation that audit records become eligible for deletion.
Supervisor records become eligible for deletion when the supervisor is terminated and the durationToRetain
time has passed since their creation.
Supervisor cleanup uses the following configuration:
druid.coordinator.kill.supervisor.on
: Whentrue
, enables cleanup for supervisor records.druid.coordinator.kill.supervisor.durationToRetain
: Defines the retention period in ISO 8601 format after creation that supervisor records become eligible for deletion.
Rules records
Rule records become eligible for deletion when all segments for the datasource have been killed by the kill task and the durationToRetain
time has passed since their creation. Automated cleanup for rules requires a kill task.
Rule cleanup uses the following configuration:
druid.coordinator.kill.rule.on
: Whentrue
, enables cleanup for rules records.druid.coordinator.kill.rule.period
: Defines the frequency in for the cleanup job to check for and delete eligible rules records. Defaults toP1D
.druid.coordinator.kill.rule.durationToRetain
: Defines the retention period in ISO 8601 format after creation that rules records become eligible for deletion.
Compaction configuration records in the druid_config
table become eligible for deletion after all segments for the datasource have been killed by the kill task. Automated cleanup for compaction configuration requires a .
druid.coordinator.kill.compaction.on
: When , enables cleanup for compaction configuration records.druid.coordinator.kill.compaction.period
: Defines the frequency in ISO 8601 format for the cleanup job to check for and delete eligible compaction configuration records. Defaults toP1D
.
If you already have an extremely large compaction configuration, you may not be able to delete compaction configuration due to size limits with the audit log. In this case you can set
druid.audit.manager.maxPayloadSizeBytes
anddruid.audit.manager.skipNullField
to avoid the auditing issue. See .
Datasource records created by supervisors
Datasource records created by supervisors become eligible for deletion when the supervisor is terminated or does not exist in the druid_supervisors
table and the durationToRetain
time has passed since their creation.
Datasource cleanup uses the following configuration:
druid.coordinator.kill.datasource.on
: Whentrue
, enables cleanup datasources created by supervisors.druid.coordinator.kill.datasource.period
: Defines the frequency in for the cleanup job to check for and delete eligible datasource records. Defaults toP1D
.druid.coordinator.kill.datasource.durationToRetain
: Defines the retention period in ISO 8601 format after creation that datasource records become eligible for deletion.
You can configure the Overlord to delete indexer task log metadata and the indexer task logs from local disk or from cloud storage.
Indexer task log cleanup on the Overlord uses the following configuration:
druid.indexer.logs.kill.enabled
: Whentrue
, enables cleanup of task logs.druid.indexer.logs.kill.durationToRetain
: Defines the length of time in milliseconds to retain task logs.druid.indexer.logs.kill.initialDelay
: Defines the length of time in milliseconds after the Overlord starts before it executes its first job to kill task logs.druid.indexer.logs.kill.delay
: The length of time in milliseconds between jobs to kill task logs.
For more detail, see .
Consider a scenario where you have scripts to create and delete hundreds of datasources and related entities a day. You do not want to fill your metadata store with leftover records. The datasources and related entities tend to persist for only one or two days. Therefore, you want to run a cleanup job that identifies and removes leftover records that are at least four days old. The exception is for audit logs, which you need to retain for 30 days:
See the following topics for more information:
- Metadata management for metadata store configuration reference.
- for an overview of the metadata storage database.