To use this aggregator, make sure you the extension in your config file:

  1. {
  2. "type" : "thetaSketch",
  3. "name" : <output_name>,
  4. "fieldName" : <metric_name>,
  5. "isInputThetaSketch": false,
  6. "size": 16384
  7. }

Sketch Estimator

  1. {
  2. "type" : "thetaSketchEstimate",
  3. "name": <output name>,
  4. "field" : <post aggregator of type fieldAccess that refers to a thetaSketch aggregator or that of type thetaSketchSetOp>
  5. }

Sketch Operations

Sketch Summary

This returns a summary of the sketch that can be used for debugging. This is the result of calling toString() method.

  1. {
  2. "type" : "thetaSketchToString",
  3. "name": <output name>,
  4. "field" : <post aggregator that refers to a Theta sketch (fieldAccess or another post aggregator)>
  5. }

Assuming, you have a dataset containing (timestamp, product, user_id). You want to answer questions like

to answer above questions, you would index your data using following aggregator.

  1. { "type": "thetaSketch", "name": "user_id_sketch", "fieldName": "user_id" }

then, sample query for, How many unique users visited product A?

sample query for, How many unique users visited both product A and B?

  1. {
  2. "queryType": "groupBy",
  3. "dataSource": "test_datasource",
  4. "granularity": "ALL",
  5. "filter": {
  6. "type": "or",
  7. "fields": [
  8. {"type": "selector", "dimension": "product", "value": "B"}
  9. ]
  10. },
  11. "aggregations": [
  12. {
  13. "type" : "filtered",
  14. "filter" : {
  15. "type" : "selector",
  16. "dimension" : "product",
  17. "value" : "A"
  18. },
  19. "aggregator" : {
  20. "type": "thetaSketch", "name": "A_unique_users", "fieldName": "user_id_sketch"
  21. }
  22. },
  23. {
  24. "type" : "filtered",
  25. "filter" : {
  26. "type" : "selector",
  27. "dimension" : "product",
  28. "value" : "B"
  29. },
  30. "aggregator" : {
  31. "type": "thetaSketch", "name": "B_unique_users", "fieldName": "user_id_sketch"
  32. }
  33. "postAggregations": [
  34. {
  35. "type": "thetaSketchEstimate",
  36. "name": "final_unique_users",
  37. "field":
  38. {
  39. "type": "thetaSketchSetOp",
  40. "name": "final_unique_users_sketch",
  41. "func": "INTERSECT",
  42. "fields": [
  43. {
  44. "type": "fieldAccess",
  45. "fieldName": "A_unique_users"
  46. },
  47. {
  48. "type": "fieldAccess",
  49. "fieldName": "B_unique_users"
  50. }
  51. ]
  52. }
  53. }
  54. ],
  55. "intervals": [
  56. "2014-10-19T00:00:00.000Z/2014-10-22T00:00:00.000Z"
  57. ]
  58. }

Retention Analysis Example

e.g., “How many unique users signed up in week 1, and purchased something in week 2?”

Using the example dataset, data would be indexed with the following aggregator, like in the example above:

    The following query expresses: