Current Key Measurements and Associated Anomaly Detection Algorithms

This topic is a quick-reference table: for each key measurementClosed A measurement or data point that is monitored and analyzed to detect anomalies and generate incidents. that Riverbed IQ Ops ingests from your data sourcesClosed A product in your network that forwards data to the system. This data can be streaming data used to detect anomalies and generate incidents, or data that can be fetched on demand when runbooks are executed., it shows which anomalyClosed An unexpected event or measurement that does not match the expected model. detection algorithm (or algorithms) the platform uses to decide when that measurement is anomalous and should produce an indicatorClosed An observed change in a specific metric stream that is recognized as being outside of an expected model. Indicators are correlated into triggers, and one or more triggers are grouped into incidents.. The mapping is preconfigured by Riverbed IQ Ops. You can change some of it (e.g. enable or disable a metric, or tune threshold and baseline settings) on the Analytics & Threshold Configuration page, but the table below is the single place to see the full picture of which algorithm applies to which metric.

Not every measurement uses the same kind of logic. For example, Device Status uses change detection (up/down), User Response Time uses baselines (learned normal range), and some utilization metrics support both a threshold and a baseline policy. Knowing which algorithm applies to a metric helps you interpret incidents (why did this fire?), configure analytics (what can I tune for this metric?), and plan what to expect from each data source. For descriptions of each algorithm, see Ingest & Analytics: Indicators.

How to use the table: Each row is one measurement, identified by Data Source, Entity, and Metric. The columns are the algorithm types ([1] Change Detection, [2] Always Increasing, [3] Thresholds, [4] Baselines, [5] Dynamic Threshold, and [6] Bounded Dynamic Threshold). A checkmark in a column means that algorithm is used for that metric. Some metrics have more than one algorithm (e.g. both threshold and baseline). See the notes below the table for caveats, including when an incident is generated only when multiple algorithms agree.

Quick reference table

Riverbed IQ Ops Analytics Pipeline - Quick Reference

Data Source

Entity

Metric

[1]

Change Detection

[2]

Always Increasing

[3]

Thresholds

[4]

Baselines

[5]

Dynamic Threshold

Bounded Dynamic Threshold

Riverbed NetProfiler

Application / Client Location

User Response Time1

(error)

MoS

(error)

Interface

In Utilization

3

3

(error)

Out Utilization

3

3

(error)

Riverbed AppResponse

Application / Client Location

User Response Time1

(error)

Throughput2

(error)

% Retrans Packets

(error)

% Failed Connections

(error)

Riverbed NetIM

Device

Device Status

(error)

Device Uptime

(error)

Interface

Interface Status

(error)

In Packet Error Rate

(error)

Out Packet Error Rate

(error)

In Packet Drops Rate

(error)

Out Packet Drops Rate

(error)

In Utilization

3

3

(error)

Out Utilization

3

3

(error)
Aternity Application / Client Location Activity Network Time (error) (error)
Activity Response Time (error) (error) (error) (error) (error)
Page Load Network Time (error)
% Hang Time (error) (error) (error) (error) (error)

Notes:

 

1 - [Metric: User Response Time] is:

  • an approximation of Riverbed AppResponse [user-response-time] because Riverbed NetProfiler does not yet account for [connection_setup_time], while Riverbed AppResponse does, i.e.

Riverbed AppResponse User Response Time calculation: [user-response-time]

= ([connection_setup_time] / [connection_setup_time_n])

+ ([request_network_time] / [request_network_n])

+ ([response_network_time] / [response_network_n])

+ ([server_delay] / [server_delay_n])

Riverbed NetProfiler User Response Time calculation: [user-response-time]

= ([connection_setup_time] / [connection_setup_time_n])

+ ([request_network_time] / [request_network_n])

+ ([response_network_time] / [response_network_n])

+ ([server_delay] / [server_delay_n])

  • User Response Time is only processed for "named" applications (e.g. excludes: ICMP, SNMP, TCP_Unknown, and UDP_Unknown).

2 - [Metric: Throughput]: only monitored for VoIP-related applications: {VOIP, SIP, RTP}.

3 - [Anomaly Detection Algorithms]: for compounded anomaly detection algorithms, the platform generates an incident only when both algorithms detect anomalous behavior.