Window Functions
A frame
is one of:
frame_start
and frame_end
can be any of:
The window definition has 3 components:
The
PARTITION BY
clause separates the input rows into different partitions. This is analogous to how theGROUP BY
clause separates rows into different groups for aggregate functions. IfPARTITION BY
is not specified, the entire input is treated as a single partition.The
frame
clause specifies the sliding window of rows to be processed by the function for a given input row. A frame can beROWS
type orRANGE
type, and it runs fromframe_start
toframe_end
. Ifframe_end
is not specified, a default value ofCURRENT ROW
is used.In
ROWS
mode,CURRENT ROW
refers specifically to the current row. InRANGE
mode,CURRENT ROW
refers to any peer row of the current row for the purpose of the . If noORDER BY
is specified, all rows are considered peers of the current row. InRANGE
mode a frame start ofCURRENT ROW
refers to the first peer row of the current row, while a frame end ofCURRENT ROW
refers to the last peer row of the current row.Frame starts and ends of
expression PRECEDING
orexpression FOLLOWING
are currently only allowed inROWS
mode. They define the start or end of the frame as the specified number of rows before or after the current row. Theexpression
must be of typeINTEGER
.If no frame is specified, a default frame of
RANGE UNBOUNDED PRECEDING
is used.
The following query ranks orders for each clerk by price:
All Aggregate Functions can be used as window functions by adding the OVER
clause. The aggregate function is computed for each row over the rows within the current row’s window frame.
cume_dist() → bigint
Returns the cumulative distribution of a value in a group of values. The result is the number of rows preceding or peer with the row in the window ordering of the window partition divided by the total number of rows in the window partition. Thus, any tie values in the ordering will evaluate to the same distribution value.
dense_rank() → bigint
Returns the rank of a value in a group of values. This is similar to , except that tie values do not produce gaps in the sequence.
ntile(n) → bigint
Divides the rows for each window partition into n
buckets ranging from 1
to at most n
. Bucket values will differ by at most 1
. If the number of rows in the partition does not divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket.
For example, with 6
rows and 4
buckets, the bucket values would be as follows: 1
1
2
2
4
percent_rank() → double
Returns the percentage ranking of a value in group of values. The result is (r - 1) / (n - 1)
where r
is the rank() of the row and n
is the total number of rows in the window partition.
rank() → bigint
Returns the rank of a value in a group of values. The rank is one plus the number of rows preceding the row that are not peer with the row. Thus, tie values in the ordering will produce gaps in the sequence. The ranking is performed for each window partition.
Returns a unique, sequential number for each row, starting with one, according to the ordering of rows within the window partition.
Value functions provide an option to specify how null values should be treated when evaluating the function. Nulls can either be ignored (IGNORE NULLS
) or respected (RESPECT NULLS
). By default, null values are respected. If IGNORE NULLS
is specified, all rows where the value expression is null are excluded from the calculation. If IGNORE NULLS
is specified and the value expression is null for all rows, the default_value
is returned, or if it is not specified, null
is returned.
first_value(x) → [same as input]
Returns the first value of the window.
last_value(x) → [same as input]
Returns the last value of the window.
nth_value(x, offset) → [same as input]
Returns the value at the specified offset from beginning the window. Offsets start at 1
. The offset can be any scalar expression. If the offset is null or greater than the number of values in the window, null is returned. It is an error for the offset to be zero or negative.
lead(x[, offset[, default_value]]) → [same as input]
Returns the value at offset
rows after the current row in the window. Offsets start at 0
, which is the current row. The offset can be any scalar expression. The default offset
is 1
. If the offset is null or larger than the window, the default_value
is returned, or if it is not specified null
is returned.
lag(x[, offset[, default_value]]) → [same as input]