Aternity Health Event Incident Automation

Riverbed IQ Ops surfaces Aternity Health Event Incidents A collection of one or more related triggers. Relationships that cause triggers to be combined into incidents include application, location, operating system, or a trigger by itself. when it detects anomalies An unexpected event or measurement that does not match the expected model. in health event counts streaming from Aternity. Aternity tracks the counts of health events by application An entity type representing software applications deployed in the customer environment that are monitored for performance and anomalies., location An entity type representing physical or logical locations in the customer environment where entities are deployed and monitored., and severity once per hour. IQ Ops streams this data and tracks the critical and major health event counts, raising incidents when either the critical or major counts for a particular application and location exceed the expected counts by a significant amount.

When the application is N/A for a device health event, IQ Ops tracks that event by health event name and location instead of application and location. For details, see Device Health Events (Application N/A).

This topic explains how IQ Ops processes Aternity health event data and creates incidents, and describes the specialized runbook An automated workflow that executes a series of steps or tasks in response to a triggered event, such as the detection of anomalous behavior generating an incident, a lifecycle event, or a manually executed runbook. path that allows users to perform root-cause-analysis for these incidents.

How Aternity Health Event Tracking Works

Aternity tracks health event counts by application, location, and severity once per hour. IQ Ops receives this data stream and monitors two key metrics:

Unique Critical Health Events: The count of distinct critical health events for an application-location combination
Unique Major Health Events: The count of distinct major health events for an application-location combination

Both metrics A measurement or data point that is monitored and analyzed to detect anomalies and generate incidents. use baseline-based anomaly An unexpected event or measurement that does not match the expected model. detection One or more indicators that are correlated and may act as a trigger for incident creation or runbook execution.. IQ Ops establishes a baseline of expected health event counts for each application-location combination. When the observed count significantly exceeds the baseline (indicating an abnormal increase in health events), IQ Ops creates an incident A collection of one or more related triggers. Relationships that cause triggers to be combined into incidents include application, location, operating system, or a trigger by itself..

The policies used to track these metrics appear in the Analytics & Threshold Configuration page, in the Applications section. These policies can be enabled or disabled as desired, but like other baseline policies, they cannot be edited.

Incident Creation

When either the critical or major health event count for a particular application and location exceeds the expected baseline by a significant amount, IQ Ops creates an Aternity Health Event incident. The incident includes:

The application and location where the health event increase was detected
The severity level (critical or major) that triggered the incident
The observed count compared to the expected baseline
Timing information about when the anomaly was detected

Once an incident is created, an associated runbook An automated workflow that executes a series of steps or tasks in response to a triggered event, such as the detection of anomalous behavior generating an incident, a lifecycle event, or a manually executed runbook. automatically executes to perform root-cause-analysis.

Root-Cause Analysis Runbook

A specialized runbook path has been added to allow users to perform root-cause-analysis for Aternity health event incidents. Custom runbooks can be executed to generate insights about the root cause of the health event increase.

The runbook can access Aternity data through the Data Store node Individual components that make up a runbook automation, each performing a specific function such as data queries, transformations, logic, integrations, or visualizations., querying Aternity Device An entity type representing network devices or hardware components deployed in the customer environment that are monitored for performance and anomalies. Health Events (Raw) data to investigate:

Specific health events that contributed to the count increase
Affected devices and users
Event patterns and timing
Correlation with other network or application performance issues

You can create custom runbooks tailored to your environment's specific needs, or clone and customize existing runbooks to investigate Aternity health event incidents.

Location of Aternity Health Event Configuration

You can find and manage the Aternity health event tracking policies in the Riverbed IQ Ops UI:

From the main menu, navigate to Settings, then select Analytics & Threshold Configuration.
In the Applications section, locate the following metrics:
- Unique Critical Health Events [Baselining]
- Unique Major Health Events [Baselining]
Use the toggle controls to enable or disable tracking for each metric as desired.

Note: Like other baseline policies, the Aternity health event policies cannot be edited. You can only enable or disable them.

Viewing Aternity Health Event Incidents

To view incidents created from Aternity health event anomalies:

From the main menu, navigate to Incidents.
Use the filters to search for incidents related to Aternity health events. You can filter by:
- Entity Things deployed in the customer environment that are needed to run the business, such as applications, devices, interfaces, and locations. type: Application
- Metric A measurement or data point that is monitored and analyzed to detect anomalies and generate incidents.: Unique Critical Health Events or Unique Major Health Events
- Source: Aternity
Click on an incident to view details, including the runbook analysis that was automatically executed.