Analytics & Threshold Configuration

  1. Navigate to the Analytics & Threshold Configuration Page:

    1. Click the Waffle Icon.

    2. Click AI Ops > Analytics & Incidents.

    3. In the Management page, click the Hamburger Icon, then click Analytics & Threshold Configuration.

This page lets you view and configure the settings that determine when and how Riverbed IQ Ops recognizes network behavior as indicators. Indicators are the fundamental building blocks of incidents.

A policy controls how the Riverbed IQ Ops data pipeline creates indicators, and, therefore, incidents. Upon installation, IQ Ops has a number of built-in policies in place that are configured according to best practices for monitoring networks and which measurements merit being called to your attention in the form of an incident. What merits being called to your attention is a subjective, user/network-defined decision, so IQ Ops provides the ability to override some policies with user-configurable settings. A single policy applies a rule to a measurement on a network entity.

Analytics Algorithms Overview

There are six algorithms currently part of the Analytics service on IQ Ops. Two of these algorithms, the Change Detector (operating on interface and device status values) and the Always Increasing Detector (operating on device uptime measurements), are not slated to have their configurations user-editable, save for enabling/disabling the algorithm. The other four algorithms—Threshold, Baseline, Dynamic Threshold, and Bounded Dynamic Threshold—have user-configurable parameters which can be used to tune the algorithm for better performance on a customer's network. Of these, the Threshold algorithm is already editable and therefore will be skipped.

N of M Parameters

These parameters are available for configuration for each of the four algorithms; they restrict indicator production to only those times when the algorithm has seen anomalous behavior in N out of the last M observations.

Example: A Threshold of 80% on inbound utilization with N of 4 and M of 5 means an inbound utilization above 80% must be seen in 4 of the last 5 observations for an indicator to be produced.

The Analytics & Threshold Configuration page comprises three sections:

  1. Devices

  2. Interfaces

  3. Applications

Each section has a set of relevant metrics that you can enable/disable. In addition, you can configure thresholds.

Network Devices

  • Device Status [Status Change]

  • Device Uptime [Uptime Reset]

Network Interfaces

  • Interface Status [Status Change]

  • In Packet Drops Rate [Baselining]

  • In Packet Error Rate [Baselining]

  • In Utilization [Baselining]

  • In Utilization [Threshold]

  • Out Packet Drops Rate [Threshold]

  • Out Packet Error Rate [Threshold]

  • Out Utilization [Baselining]

  • Out Utilization [Threshold]

Application + Location (i.e., application “MS-Exchange” in location “Branch Office Denver”)

  • Activity Network Time [Dynamic Threshold]

  • Activity Response Time [Dynamic Threshold]

  • % Failed Connections [Threshold]

  • MOS [Threshold]

  • Page Load Network Time [Dynamic Threshold]

  • % Total Retrans [Threshold]

  • Throughput (VoIP) [Baselining]

  • User Response Time [Baselining]

  • Unique Critical Health Events [Baselining]

  • Unique Major Health Events [Baselining]

  • % Hang Time [Bounded Dynamic Threshold]

Aternity Health Event Metrics

The Unique Critical Health Events and Unique Major Health Events metrics track health event counts from Aternity. Aternity tracks the counts of health events by application, location, and severity once per hour. IQ Ops streams this data and monitors these counts, creating incidents when either the critical or major health event counts for a particular application and location exceed the expected baseline by a significant amount.

Both metrics use baseline-based anomaly detection. You can enable or disable these policies as desired, but like other baseline policies, they cannot be edited. For more information about how these incidents are created and how to use runbooks to investigate them, see Aternity Health Event Incident Automation.

Edit a Static Threshold Value

To edit a static threshold value, click Edit to open the Edit Static Threshold dialog box.

Specify the value at which to generate an indicator from this threshold and the number of measurements to use. You can create an indicator based on a single measurement of the metric, for some number of consecutive measurements, or for N out of M measurements. For example, specifying "2 out of 3" means that the rule must violate for 2 of the last 3 measurements for an indicator to be created. N and M default to 1. M has a maximum value of 10. N must be <= M. If M > 1, N must be > 1 also.

Configure Baseline Settings

To configure baseline settings, click Edit for a metric that uses baselining to open the baseline configuration dialog box.

The baseline algorithm learns the seasonal (time of day/day of week) variation of the observed metric. For every 15 minute period, an expected value for that metric is calculated, based on the learned metric's history on the entity being tracked. When the observed value deviates from the expected value, an indicator is created. The following parameters control how large a deviation is required for an indicator to be produced.

Baseline configuration includes the following settings:

  • Change Above Expected: This value should be greater than 1. If the observed value is greater than the expected value × Change Above Expected, an indicator can be produced. Larger values make it harder to create an indicator. For example, a Change Above Expected of 1.2 means that, for an expected value of 10.0, the observed value must be greater than 12 (10 × 1.2) to be an indicator. If this parameter is null, the algorithm will not create indicators where the observed value is greater than the expected value. Either Change Above or Change Below must have a value.

  • Change Below Expected: This value should be less than 1. If the observed value is less than the expected value × Change Below Expected, an indicator can be produced. Smaller values make it harder to create an indicator. For example, a Change Below Expected of 0.8 means that, for an expected value of 10.0, the observed value must be less than 8 (10 × 0.8) to be an indicator. If this parameter is null, the algorithm will not create indicators where the observed value is less than the expected value. Either Change Above or Change Below must have a value.

  • Minimum Tolerance: This parameter is similar to Change Above/Below Expected, but utilizes an absolute tolerance rather than the relative tolerance of Change Above/Below Expected. Larger values make it harder to create an indicator. For example, a Minimum Tolerance value of 5.0 means that, for an expected value of 10.0, the observed value must be greater than 15 (10 + 5.0) or less than 5 (10 - 5.0) to be an indicator. If this parameter is null, it will not be used to restrict the creation of indicators.

  • N of M parameters: These parameters restrict indicator production to only those times when the algorithm has seen anomalous behavior in N out of the last M observations.

  • Metric name and unit: The metric name and the unit of the required minimum deviation are automatically adjusted based on the metric unit in use.

  • Granularity: The granularity setting (for example, 15 mins) represents the time interval used for baselining and must be adjusted based on what Analytics supports.

  • Validation: All numeric values must be positive numbers.

Configure Dynamic Threshold Settings

To configure dynamic threshold settings, click Edit for a metric that uses dynamic threshold to open the dynamic threshold configuration dialog box.

The Dynamic Threshold algorithm uses a statistical analysis of the metric's history on the entity being tracked to produce an expected value for the metric, every 15 minutes. This algorithm also tracks the number of measurements that were recorded in that 15 minute window, in order to filter out noisy measurements.

Dynamic threshold configuration includes the following settings:

  • Probability of Observed Value is Above %: A percentage, between 0 and 100, representing how likely the observed value is normal. Values closer to 100% will result in fewer indicators.

  • Probability of Number of Observations is Above %: A percentage, between 0 and 100, representing how likely the number of observations is normal. Values closer to 100% will result in fewer indicators; but this value should typically be below 20%. If null, this parameter will not be used to restrict the creation of indicators.

  • Required minimum deviation: The metric name and the unit of the required minimum deviation are automatically adjusted based on the metric unit in use. This value must be a positive number.

  • Minimum Tolerance: Utilize an absolute tolerance rather than a probability to determine if an observation is different enough from the expected value to produce an indicator. Larger values make it harder to create an indicator. For example, a Minimum Tolerance value of 5.0 means that, for an expected value of 10.0, the observed value must be greater than 15 (10 + 5.0) or less than 5 (10 - 5.0) to be an indicator. If null, this parameter will not be used to restrict the creation of indicators.

  • Minimum Number of Observations: A threshold on the number of observations; if present, it should be an integer greater than 0. If the number of observations is not above this value, an indicator cannot be created, no matter how unlikely the observation is, given the learned history of the metric. Larger values make it harder to create indicators. If this parameter is null, no threshold on the number of observations will be applied.

  • N of M parameters: These parameters restrict indicator production to only those times when the algorithm has seen anomalous behavior in N out of the last M observations.

  • Validation: The required minimum deviation must be positive. The probability must be strictly higher than 0 and strictly lower than 100.

Configure Bounded Dynamic Threshold Settings

To configure bounded dynamic threshold settings, click Edit for a metric that uses bounded dynamic threshold to open the bounded dynamic threshold configuration dialog box.

The Bounded Dynamic Threshold algorithm uses a statistical analysis of the metric's history on the entity being tracked to produce an expected value for the metric, every 15 minutes. This analysis can only be applied to metrics which have a bounded range of values—for example, percentages which can only be 0-100%. This algorithm also tracks the number of measurements that were recorded in that 15 minute window, in order to filter out noisy measurements.

Bounded dynamic threshold configuration includes the following settings:

  • Probability of Observed Value is Above %: A percentage, between 0 and 100, representing how likely the observed value is normal. Values closer to 100% will result in fewer indicators.

  • Probability of Number of Observations is Above %: A percentage, between 0 and 100, representing how likely the number of observations is normal. Values closer to 100% will result in fewer indicators; but this value should typically be below 20%. If null, this parameter will not be used to restrict the creation of indicators.

  • Required minimum deviation: The metric name and the unit of the required minimum deviation are automatically adjusted based on the metric unit in use. This value must be a positive number.

  • Minimum Tolerance: Utilize an absolute tolerance rather than a probability to determine if an observation is different enough from the expected value to produce an indicator. Larger values make it harder to create an indicator. For example, a Minimum Tolerance value of 5.0 means that, for an expected value of 10.0, the observed value must be greater than 15 (10 + 5.0) or less than 5 (10 - 5.0) to be an indicator. If null, this parameter will not be used to restrict the creation of indicators.

  • Minimum Number of Observations: A threshold on the number of observations; if present, it should be an integer greater than 0. If the number of observations is not above this value, an indicator cannot be created, no matter how unlikely the observation is, given the learned history of the metric. Larger values make it harder to create indicators. If this parameter is null, no threshold on the number of observations will be applied.

  • N of M parameters: These parameters restrict indicator production to only those times when the algorithm has seen anomalous behavior in N out of the last M observations.

  • Validation: The required minimum deviation must be positive. The probability must be strictly higher than 0 and strictly lower than 100.