Anomaly detection applies to various scenarios, including intrusion detection, financial fraud detection, sensor data monitoring, medical diagnosis, natural data detection, and more. The typical algorithms for anomaly detection include the statistical modeling method, distance-based calculation method, linear model, and nonlinear model.
DLI uses an anomaly detection method based on the random forest, which has the following characteristics:
1 | SRF_UNSUP(ARRAY[Field 1, Field 2, ...], 'Optional parameter list') |
Parameter |
Mandatory |
Description |
Default Value |
---|---|---|---|
transientThreshold |
No |
Threshold for which the histogram change is indicating a change in the data. |
5 |
numTrees |
No |
Number of trees composing the random forest. |
15 |
maxLeafCount |
No |
Maximum number of leaf nodes one tree can have. |
15 |
maxTreeHeight |
No |
Maximum height of the tree. |
12 |
seed |
No |
Random seed value used by the algorithm. |
4010 |
numClusters |
No |
Number of types of data to be detected. By default, the following two data types are available: anomalous and normal data. |
2 |
dataViewMode |
No |
Algorithm learning mode.
|
history |
Anomaly detection is conducted on the c field in data stream MyTable. If the anomaly score is greater than 0.8, then the detection result is considered to be anomaly.
1 2 3 4 5 6 | SELECT c, CASE WHEN SRF_UNSUP(ARRAY[c], "numTrees=15,seed=4010") OVER (ORDER BY proctime RANGE BETWEEN INTERVAL '99' SECOND PRECEDING AND CURRENT ROW) > 0.8 THEN 'anomaly' ELSE 'not anomaly' END FROM MyTable |