anomaly_detector
You can configure the anomaly detector processor by specifying a key and the options for the selected mode. You can use the following options to configure the anomaly detector processor.
Keys that are used in the anomaly detector processor are present in the input event. For example, if the input event is {"key1":value1, "key2":value2, "key3":value3}
, then any of the keys (such as key1
, key2
, key3
) in that input event can be used as anomaly detector keys as long as their value (such as value1
, value2
, value3
) is an integer or real number.
random_cut_forest mode
RCF is an unsupervised ML algorithm for detecting anomalous data points within a dataset. Data Prepper uses RCF to detect anomalies in data by passing the values of the configured key to RCF. For example, when an event with a latency value of 11.5 is sent, the following anomaly event is generated:
In this example, deviation_from_expected
is a list of deviations for each of the keys from their corresponding expected values, and grade
is the anomaly grade that indicates the anomaly severity.
Usage
To get started, create the following pipeline.yaml
file. You can use the following pipeline configuration to look for anomalies in the field in events that are passed to the processor. Then you can use the following YAML configuration file random_cut_forest
mode to detect anomalies:
ad-pipeline:
source:
...
processor:
- anomaly_detector:
keys: ["latency"]
random_cut_forest:
When you run the anomaly detector processor, the processor extracts the value for the latency
key, and then passes the value through the RCF ML algorithm. You can configure any key that comprises integers or real numbers as values. In the following example, you can configure bytes
or latency
as the key for an anomaly detector.