SQL scalar functions
Druid SQL includes scalar functions that include numeric and string functions, IP address functions, Sketch functions, and more, as described on this page.
For mathematical operations, Druid SQL will use integer math if all operands involved in an expression are integers. Otherwise, Druid will switch to floating point math. You can force this to happen by casting one of your operands to FLOAT. At runtime, Druid will widen 32-bit floats to 64-bit for most expressions.
String functions
String functions accept strings, and return a type appropriate to the function.
Function | Notes |
---|---|
CONCAT(expr, expr…) | Concats a list of expressions. Also see the . |
TEXTCAT(expr, expr) | Two argument version of CONCAT . |
STRING_FORMAT(pattern, [args…]) | Returns a string formatted in the manner of Java’s String.format. |
LENGTH(expr) | Length of expr in UTF-16 code units. |
CHAR_LENGTH(expr) | Alias for LENGTH . |
CHARACTER_LENGTH(expr) | Alias for LENGTH . |
STRLEN(expr) | Alias for LENGTH . |
LOOKUP(expr, lookupName) | Look up expr in a registered . Note that lookups can also be queried directly using the lookup schema. |
LOWER(expr) | Returns expr in all lowercase. |
UPPER(expr) | Returns expr in all uppercase. |
PARSE_LONG(string, [radix]) | Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided. |
Returns the index of needle within haystack , with indexes starting from 1. The search will begin at fromIndex , or 1 if fromIndex is not specified. If needle is not found, returns 0. | |
REGEXP_EXTRACT(expr, pattern, [index]) | Apply regular expression pattern to expr and extract a capture group, or NULL if there is no match. If index is unspecified or zero, returns the first substring that matched the pattern. The pattern may match anywhere inside expr ; if you want to match the entire string instead, use the ^ and $ markers at the start and end of your pattern. Note: when druid.generic.useDefaultValueForNull = true , it is not possible to differentiate an empty-string match from a non-match (both will return NULL ). |
REGEXP_LIKE(expr, pattern) | Returns whether expr matches regular expression pattern . The pattern may match anywhere inside expr ; if you want to match the entire string instead, use the ^ and $ markers at the start and end of your pattern. Similar to , but uses regexps instead of LIKE patterns. Especially useful in WHERE clauses. |
CONTAINS_STRING(expr, str) | Returns true if the str is a substring of expr . |
ICONTAINS_STRING(expr, str) | Returns true if the str is a substring of expr . The match is case-insensitive. |
REPLACE(expr, pattern, replacement) | Replaces pattern with replacement in expr , and returns the result. |
STRPOS(haystack, needle) | Returns the index of needle within haystack , with indexes starting from 1. If needle is not found, returns 0. |
SUBSTRING(expr, index, [length]) | Returns a substring of expr starting at index, with a max length, both measured in UTF-16 code units. |
RIGHT(expr, [length]) | Returns the rightmost length characters from expr . |
LEFT(expr, [length]) | Returns the leftmost length characters from expr . |
SUBSTR(expr, index, [length]) | Alias for SUBSTRING . |
TRIM([BOTH | LEADING | TRAILING] [chars FROM] expr) | Returns expr with characters removed from the leading, trailing, or both ends of “expr” if they are in “chars”. If “chars” is not provided, it defaults to “ “ (a space). If the directional argument is not provided, it defaults to “BOTH”. |
BTRIM(expr, [chars]) | Alternate form of TRIM(BOTH chars FROM expr) . |
LTRIM(expr, [chars]) | Alternate form of TRIM(LEADING chars FROM expr) . |
RTRIM(expr, [chars]) | Alternate form of TRIM(TRAILING chars FROM expr) . |
REVERSE(expr) | Reverses expr . |
REPEAT(expr, [N]) | Repeats expr N times |
LPAD(expr, length, [chars]) | Returns a string of length from expr left-padded with chars . If length is shorter than the length of expr , the result is expr which is truncated to length . The result will be null if either expr or chars is null. If chars is an empty string, no padding is added, however expr may be trimmed if necessary. |
RPAD(expr, length, [chars]) | Returns a string of length from expr right-padded with chars . If length is shorter than the length of expr , the result is expr which is truncated to length . The result will be null if either expr or chars is null. If chars is an empty string, no padding is added, however expr may be trimmed if necessary. |
Time functions can be used with:
- Druid’s primary timestamp column,
__time
; - String timestamps, through the TIME_PARSE function.
Literal timestamps in the connection time zone can be written using TIMESTAMP '2000-01-01 00:00:00'
syntax. The simplest way to write literal timestamps in other time zones is to use TIME_PARSE, like TIME_PARSE('2000-02-01 00:00:00', NULL, 'America/Los_Angeles')
.
The best ways to filter based on time are by using ISO8601 intervals, like TIME_IN_INTERVAL(__time, '2000-01-01/2000-02-01')
, or by using literal timestamps with the >=
and <
operators, like __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-02-01 00:00:00'
.
Druid supports the standard SQL BETWEEN operator, but we recommend avoiding it for time filters. BETWEEN is inclusive of its upper bound, which makes it awkward to write time filters correctly. For example, the equivalent of TIME_IN_INTERVAL(__time, '2000-01-01/2000-02-01')
is __time BETWEEN TIMESTAMP '2000-01-01 00:00:00' AND TIMESTAMP '2000-01-31 23:59:59.999'
.
Druid processes timestamps internally as longs (64-bit integers) representing milliseconds since the epoch. Therefore, time functions perform best when used with the primary timestamp column, or with timestamps stored in long columns as milliseconds and accessed with MILLIS_TO_TIMESTAMP. Other timestamp representations, include string timestamps and POSIX timestamps (seconds since the epoch) require query-time conversion to Druid’s internal form, which adds additional overhead.
Function | Notes |
---|---|
CURRENT_TIMESTAMP | Current timestamp in the connection’s time zone. |
CURRENT_DATE | Current date in the connection’s time zone. |
DATE_TRUNC(unit, timestamp_expr) | Rounds down a timestamp, returning it as a new timestamp. Unit can be ‘milliseconds’, ‘second’, ‘minute’, ‘hour’, ‘day’, ‘week’, ‘month’, ‘quarter’, ‘year’, ‘decade’, ‘century’, or ‘millennium’. |
TIME_CEIL(timestamp_expr, period, [origin, [timezone]]) | Rounds up a timestamp, returning it as a new timestamp. Period can be any ISO8601 period, like P3M (quarters) or PT12H (half-days). Specify origin as a timestamp to set the reference time for rounding. For example, measures an hourly period from 00:30-01:30 instead of 00:00-01:00. See for details on the default starting boundaries. The time zone, if provided, should be a time zone name like “America/Los_Angeles” or offset like “-08:00”. This function is similar to CEIL but is more flexible. |
TIME_FLOOR(timestamp_expr, period, [origin, [timezone]]) | Rounds down a timestamp, returning it as a new timestamp. Period can be any ISO8601 period, like P3M (quarters) or PT12H (half-days). Specify origin as a timestamp to set the reference time for rounding. For example, TIME_FLOOR(time, ‘PT1H’, TIMESTAMP ‘2016-06-27 00:30:00’) measures an hourly period from 00:30-01:30 instead of 00:00-01:00. See Period granularities for details on the default starting boundaries. The time zone, if provided, should be a time zone name like “America/Los_Angeles” or offset like “-08:00”. This function is similar to FLOOR but is more flexible. |
TIME_SHIFT(timestamp_expr, period, step, [timezone]) | Shifts a timestamp by a period (step times), returning it as a new timestamp. Period can be any ISO8601 period. Step may be negative. The time zone, if provided, should be a time zone name like “America/Los_Angeles” or offset like “-08:00”. |
TIME_EXTRACT(timestamp_expr, [unit, [timezone]]) | Extracts a time part from expr , returning it as a number. Unit can be EPOCH, SECOND, MINUTE, HOUR, DAY (day of month), DOW (day of week), DOY (day of year), WEEK (week of ), MONTH (1 through 12), QUARTER (1 through 4), or YEAR. The time zone, if provided, should be a time zone name like “America/Los_Angeles” or offset like “-08:00”. This function is similar to EXTRACT but is more flexible. Unit and time zone must be literals, and must be provided quoted, like TIME_EXTRACT(time, ‘HOUR’) or TIME_EXTRACT( time, ‘HOUR’, ‘America/Los_Angeles’) . |
TIME_PARSE(string_expr, [pattern, [timezone]]) | Parses a string into a timestamp using a given Joda DateTimeFormat pattern, or ISO8601 (e.g. 2000-01-02T03:04:05Z ) if the pattern is not provided. The time zone, if provided, should be a time zone name like “America/Los_Angeles” or offset like “-08:00”, and will be used as the time zone for strings that do not include a time zone offset. Pattern and time zone must be literals. Strings that cannot be parsed as timestamps will be returned as NULL. |
TIME_FORMAT(timestamp_expr, [pattern, [timezone]]) | Formats a timestamp as a string with a given , or ISO8601 (e.g. 2000-01-02T03:04:05Z ) if the pattern is not provided. The time zone, if provided, should be a time zone name like “America/Los_Angeles” or offset like “-08:00”. Pattern and time zone must be literals. |
TIME_IN_INTERVAL(timestamp_expr, interval) | Returns whether a timestamp is contained within a particular interval. The interval must be a literal string containing any ISO8601 interval, such as ‘2001-01-01/P1D’ or ‘2001-01-01T01:00:00/2001-01-02T01:00:00’ . The start instant of the interval is inclusive and the end instant is exclusive. |
MILLIS_TO_TIMESTAMP(millis_expr) | Converts a number of milliseconds since the epoch (1970-01-01 00:00:00 UTC) into a timestamp. |
TIMESTAMP_TO_MILLIS(timestamp_expr) | Converts a timestamp into a number of milliseconds since the epoch. |
EXTRACT(unit FROM timestamp_expr) | Extracts a time part from expr , returning it as a number. Unit can be EPOCH, MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR, DAY (day of month), DOW (day of week), ISODOW (ISO day of week), DOY (day of year), WEEK (week of year), MONTH, QUARTER, YEAR, ISOYEAR, DECADE, CENTURY or MILLENNIUM. Units must be provided unquoted, like EXTRACT(HOUR FROM __time) . |
FLOOR(timestamp_expr TO unit) | Rounds down a timestamp, returning it as a new timestamp. Unit can be SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR. |
CEIL(timestamp_expr TO unit) | Rounds up a timestamp, returning it as a new timestamp. Unit can be SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR. |
TIMESTAMPADD(unit, count, timestamp) | Equivalent to timestamp + count * INTERVAL ‘1’ UNIT . |
TIMESTAMPDIFF(unit, timestamp1, timestamp2) | Returns the (signed) number of unit between timestamp1 and timestamp2 . Unit can be SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR. |
Reduction functions
- If all argument are
NULL
, the result isNULL
. Otherwise,NULL
arguments are ignored. - If the arguments comprise a mix of numbers and strings, the arguments are interpreted as strings.
- If all arguments are integer numbers, the arguments are interpreted as longs.
- If all arguments are numbers and at least one argument is a double, the arguments are interpreted as doubles.
For the IPv4 address functions, the address
argument can either be an IPv4 dotted-decimal string (e.g., “192.168.0.1”) or an IP address represented as an integer (e.g., 3232235521). The subnet
argument should be a string formatted as an IPv4 address subnet in CIDR notation (e.g., “192.168.0.0/16”).
Function | Notes |
---|---|
IPV4_MATCH(address, subnet) | Returns true if the address belongs to the subnet literal, else false. If address is not a valid IPv4 address, then false is returned. This function is more efficient if address is an integer instead of a string. |
IPV4_PARSE(address) | Parses address into an IPv4 address stored as an integer . If address is an integer that is a valid IPv4 address, then it is passed through. Returns null if address cannot be represented as an IPv4 address. |
IPV4_STRINGIFY(address) | Converts address into an IPv4 address dotted-decimal string. If address is a string that is a valid IPv4 address, then it is passed through. Returns null if address cannot be represented as an IPv4 address. |
Sketch functions
These functions operate on expressions or columns that return sketch objects. To create sketch objects, see the .
The following functions operate on . The DataSketches extension must be loaded to use the following functions.
Function | Notes |
---|---|
HLL_SKETCH_ESTIMATE(expr, [round]) | Returns the distinct count estimate from an HLL sketch. expr must return an HLL sketch. The optional round boolean parameter will round the estimate if set to true , with a default of false . |
HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr, [numStdDev]) | Returns the distinct count estimate and error bounds from an HLL sketch. expr must return an HLL sketch. An optional numStdDev argument can be provided. |
HLL_SKETCH_UNION([lgK, tgtHllType], expr0, expr1, …) | Returns a union of HLL sketches, where each input expression must return an HLL sketch. The lgK and tgtHllType can be optionally specified as the first parameter; if provided, both optional parameters must be specified. |
HLL_SKETCH_TO_STRING(expr) | Returns a human-readable string representation of an HLL sketch for debugging. expr must return an HLL sketch. |
The following functions operate on theta sketches. The must be loaded to use the following functions.
Function | Notes |
---|---|
DS_GET_QUANTILE(expr, fraction) | Returns the quantile estimate corresponding to fraction from a quantiles sketch. expr must return a quantiles sketch. |
DS_GET_QUANTILES(expr, fraction0, fraction1, …) | Returns a string representing an array of quantile estimates corresponding to a list of fractions from a quantiles sketch. expr must return a quantiles sketch. |
DS_HISTOGRAM(expr, splitPoint0, splitPoint1, …) | Returns a string representing an approximation to the histogram given a list of split points that define the histogram bins from a quantiles sketch. expr must return a quantiles sketch. |
DS_CDF(expr, splitPoint0, splitPoint1, …) | Returns a string representing approximation to the Cumulative Distribution Function given a list of split points that define the edges of the bins from a quantiles sketch. expr must return a quantiles sketch. |
DS_RANK(expr, value) | Returns an approximation to the rank of a given value that is the fraction of the distribution less than that value from a quantiles sketch. expr must return a quantiles sketch. |
DS_QUANTILE_SUMMARY(expr) | Returns a string summary of a quantiles sketch, useful for debugging. expr must return a quantiles sketch. |
Function | Notes |
---|---|
CAST(value AS TYPE) | Cast value to another type. See Data types for details about how Druid SQL handles CAST. |
CASE expr WHEN value1 THEN result1 [ WHEN value2 THEN result2 … ] [ ELSE resultN ] END | Simple CASE. |
CASE WHEN boolean_expr1 THEN result1 [ WHEN boolean_expr2 THEN result2 … ] [ ELSE resultN ] END | Searched CASE. |
NULLIF(value1, value2) | Returns NULL if value1 and value2 match, else returns value1. |
COALESCE(value1, value2, …) | Returns the first value that is neither NULL nor empty string. |
NVL(expr, expr-for-null) | Returns expr-for-null if expr is null (or empty string for string type). |
Returns true if the value of expr is contained in the Base64-serialized Bloom filter. See the documentation for additional details. See the BLOOM_FILTER function for computing Bloom filters. |