DataSketches Quantiles Sketch module
There are three major modes of operation:
- Ingesting sketches built outside of Druid (say, with Pig or Hive)
- Building sketches from raw data during ingestion
To use this aggregator, make sure you include the extension in your config file:
"type" : "quantilesDoublesSketch",
"name" : <output_name>,
"fieldName" : <metric_name>,
"k": <parameter that controls size and accuracy>
}
Post Aggregators
Quantile
This returns an approximation to the value that would be preceded by a given fraction of a hypothetical sorted version of the input stream.
Quantiles
This returns an array of quantiles corresponding to a given array of fractions
{
"name": <output name>,
"field" : <post aggregator that refers to a DoublesSketch (fieldAccess or another post aggregator)>,
}
Histogram
Rank
This returns an approximation to the rank of a given value that is the fraction of the distribution less than that value.
{
"type" : "quantilesDoublesSketchToRank",
"name": <output name>,
"value" : <value>
}
CDF
This returns an approximation to the Cumulative Distribution Function given an array of split points that define the edges of the bins. An array of m unique, monotonically increasing split points divide the real number line into m+1 consecutive disjoint intervals. The definition of an interval is inclusive of the left split point and exclusive of the right split point. The resulting array of fractions can be viewed as ranks of each split point with one additional rank that is always 1.
Sketch Summary
{
"type" : "quantilesDoublesSketchToString",
"name": <output name>,
}