Aternity Health Event Incident Automation

Riverbed IQ Ops surfaces Aternity Health Event Incidents when it detects anomalies in health event counts streaming from Aternity. Aternity tracks the counts of health events by application, location, and severity once per hour. IQ Ops streams this data and tracks the critical and major health event counts, raising incidents when either the critical or major counts for a particular application and location exceed the expected counts by a significant amount.

This topic explains how IQ Ops processes Aternity health event data and creates incidents, and describes the specialized runbook path that allows users to perform root-cause-analysis for these incidents.

How Aternity Health Event Tracking Works

Aternity tracks health event counts by application, location, and severity once per hour. IQ Ops receives this data stream and monitors two key metrics:

  • Unique Critical Health Events — The count of distinct critical health events for an application-location combination

  • Unique Major Health Events — The count of distinct major health events for an application-location combination

Both metrics use baseline-based anomaly detection. IQ Ops establishes a baseline of expected health event counts for each application-location combination. When the observed count significantly exceeds the baseline (indicating an abnormal increase in health events), IQ Ops creates an incident.

The policies used to track these metrics appear in the Analytics & Threshold Configuration page, in the Applications section. These policies can be enabled or disabled as desired, but like other baseline policies, they cannot be edited.

Incident Creation

When either the critical or major health event count for a particular application and location exceeds the expected baseline by a significant amount, IQ Ops creates an Aternity Health Event incident. The incident includes:

  • The application and location where the health event increase was detected

  • The severity level (critical or major) that triggered the incident

  • The observed count compared to the expected baseline

  • Timing information about when the anomaly was detected

Once an incident is created, an associated runbook automatically executes to perform root-cause-analysis.

Root-Cause Analysis Runbook

A specialized runbook path has been added to allow users to perform root-cause-analysis for Aternity health event incidents. Custom runbooks can be executed to generate insights about the root cause of the health event increase.

The runbook can access Aternity data through the Data Store node, querying Aternity Device Health Events (Raw) data to investigate:

  • Specific health events that contributed to the count increase

  • Affected devices and users

  • Event patterns and timing

  • Correlation with other network or application performance issues

You can create custom runbooks tailored to your environment's specific needs, or clone and customize existing runbooks to investigate Aternity health event incidents.

Location of Aternity Health Event Configuration

You can find and manage the Aternity health event tracking policies in the Riverbed IQ Ops UI:

  1. From the main menu, navigate to Settings, then select Analytics & Threshold Configuration.

  2. In the Applications section, locate the following metrics:

    • Unique Critical Health Events [Baselining]

    • Unique Major Health Events [Baselining]

  3. Use the toggle controls to enable or disable tracking for each metric as desired.

Note: Like other baseline policies, the Aternity health event policies cannot be edited. You can only enable or disable them.

Viewing Aternity Health Event Incidents

To view incidents created from Aternity health event anomalies:

  1. From the main menu, navigate to Incidents.

  2. Use the filters to search for incidents related to Aternity health events. You can filter by:

    • Entity type: Application

    • Metric: Unique Critical Health Events or Unique Major Health Events

    • Source: Aternity

  3. Click on an incident to view details, including the runbook analysis that was automatically executed.