Druid data model
Druid schemas must always include a primary timestamp. Druid uses the primary timestamp to your data. Druid uses the primary timestamp to rapidly identify and retrieve data within the time range of queries. Druid also uses the primary timestamp column for time-based data management operations such as dropping time chunks, overwriting time chunks, and time-based retention rules.
Druid parses the primary timestamp based on the configuration at ingestion time. Regardless of the source field for the primary timestamp, Druid always stores the timestamp in the column in your Druid datasource.
Dimensions are columns that Druid stores “as-is”. You can use dimensions for any purpose. For example, you can group, filter, or apply aggregators to dimensions at query time when necessary.
If you disable , then Druid treats the set of dimensions like a set of columns to ingest. The dimensions behave exactly as you would expect from any database that does not support a rollup feature.
Metrics are columns that Druid stores in an aggregated form. Metrics are most useful when you enable . If you specify a metric, you can apply an aggregation function to each row during ingestion. This has the following benefits:
Rollup is a form of aggregation that collapses dimensions while aggregating the values in the metrics, that is, it collapses rows but retains its summary information.”
- Druid can compute some aggregators, especially approximate ones, more quickly at query time if they are partially computed at ingestion time, including data that has not been rolled up.