SQL data types
Columns in Druid are associated with a specific data type. This topic describes supported data types in Druid SQL.
Druid natively supports five basic column types: “long” (64 bit signed int), “float” (32 bit float), “double” (64 bit float) “string” (UTF-8 encoded strings and string arrays), and “complex” (catch-all for more exotic data types like hyperUnique and approxHistogram columns).
The following table describes how Druid maps SQL types onto native types at query runtime. Casts between two SQL types that have the same Druid runtime type will have no effect, other than exceptions noted in the table. Casts between two SQL types that have different Druid runtime types will generate a runtime cast in Druid. If a value cannot be properly cast to another value, as in CAST('foo' AS BIGINT)
, the runtime will substitute a default value. NULL values cast to non-nullable types will also be substituted with a default value (for example, nulls cast to numbers will be converted to zeroes).
Druid’s native type system allows strings to potentially have multiple values. These will be reported in SQL as VARCHAR
typed, and can be syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions will be applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special multi-value string functions, which can perform powerful array-aware operations.
Because multi-value dimensions are treated by the SQL planner as
VARCHAR
, there are some inconsistencies between how they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be incorrectly optimized by the Druid SQL planner:multi_val_dim = 'a' AND multi_val_dim = 'b'
will be optimized tofalse
, even though it is possible for a single row to have both “a” and “b” as values for . The SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior in native queries.
The druid.generic.useDefaultValueForNull
runtime property controls Druid’s NULL handling mode.
In SQL compatible mode (), NULLs are treated more closely to the SQL standard. The property affects both storage and querying, so for correct behavior, it should be set on all Druid service types to be available at both ingestion time and query time. There is some overhead associated with the ability to handle NULLs; see the documentation for more details.