Algorithm of the aggregator is the same with that of apache hive. This is the description in GenericUDAFVariance in hive.

Evaluate the variance using the algorithm described by Chan, Golub, and LeVeque in “Algorithms for computing the sample variance: analysis and recommendations” The American Statistician, 37 (1983) pp. 242—247.

variance = variance1 + variance2 + n/(m(m+n)) \ pow(((m/n)*t1 - t2),2)

This algorithm was proven to be numerically stable by J.L. Barlow in “Error analysis of a pairwise summation algorithm to compute sample variance” Numer. Math, 58 (1991) pp. 583—590

User can specify expected input type as one of “float”, “double”, “long”, “variance” for ingestion, which is by default “float”.

To query for results, “variance” aggregator with “variance” input type or simply a “varianceFold” aggregator must be included in the query.

  1. {
  2. "type" : "varianceFold",
  3. "name" : <output_name>,
  4. "fieldName" : <metric_name>,
  5. "estimator" : <string>
  6. }

To acquire standard deviation from variance, user can use “stddev” post aggregator.

Query examples:

  1. {
  2. "queryType": "timeseries",
  3. "dataSource": "testing",
  4. "granularity": "day",
  5. {
  6. "type": "variance",
  7. "fieldName": "index_var"
  8. }
  9. ],
  10. "intervals": [
  11. "2016-03-01T00:00:00.000/2013-03-20T00:00:00.000"
  12. ]
  13. }
  1. {
  2. "queryType": "groupBy",
  3. "dataSource": "testing",
  4. "dimensions": ["alias"],
  5. "granularity": "all",
  6. "aggregations": [
  7. "name": "index_var",
  8. "fieldName": "index"
  9. }
  10. ],
  11. "postAggregations": [
  12. {
  13. "type": "stddev",
  14. "name": "index_stddev",
  15. "fieldName": "index_var"
  16. }
  17. ],
  18. "intervals": [
  19. "2016-03-06T00:00:00/2016-03-06T23:59:59"
  20. ]