Aternity Health Event Metrics

The Unique Critical Health Events and Unique Major Health Events metrics A measurement or data point that is monitored and analyzed to detect anomalies and generate incidents. are two Analytics policies that track health event counts streaming from Aternity. They appear in the Applications section of the Analytics & Threshold Configuration page. This topic explains what these metrics are, why they matter, how they function, and where to enable or disable them.

What the metrics are

Aternity tracks health events by application An entity type representing software applications deployed in the customer environment that are monitored for performance and anomalies., location An entity type representing physical or logical locations in the customer environment where entities are deployed and monitored., and severity once per hour. Each metric is a count of distinct (unique) health events in that hour for an application-location combination:

Unique Critical Health Events: The count of distinct critical-severity health events for an application and location.
Unique Major Health Events: The count of distinct major-severity health events for an application and location.

IQ Ops receives this data stream and monitors both counts. If the application is N/A for a device health event, tracking uses health event name and location instead of application and location, see Device Health Events (Application N/A).

Why these metrics matter

Health events from Aternity reflect real problems on end-user devices or in application experience (e.g. crashes, hangs, errors). When the count of critical or major events spikes above what is normal for a given application and location, you want to know so you can investigate and respond. These two metrics drive indicators An observed change in a specific metric stream that is recognized as being outside of an expected model. Indicators are correlated into triggers, and one or more triggers are grouped into incidents. and therefore incidents A collection of one or more related triggers. Relationships that cause triggers to be combined into incidents include application, location, operating system, or a trigger by itself. when that happens, so you get automated alerts and can run runbooks An automated workflow that executes a series of steps or tasks in response to a triggered event, such as the detection of anomalous behavior generating an incident, a lifecycle event, or a manually executed runbook. for root-cause analysis instead of relying on manual monitoring.

How the metrics function

Both metrics use baseline-based anomaly An unexpected event or measurement that does not match the expected model. detection One or more indicators that are correlated and may act as a trigger for incident creation or runbook execution.. IQ Ops establishes an expected count for each application-location combination. When the observed count in a given hour significantly exceeds that baseline, the Analytics service generates an indicator, and IQ Ops creates an incident. The incident identifies:

The application and location.
The severity (critical or major).
The observed count versus the expected baseline.
When the anomaly was detected.

Like other baseline policies, you cannot edit these two. You can only enable or disable each metric. Disabling a metric stops indicator and incident creation for that severity across all application-location combinations.

Where to configure the metrics

The policies for Unique Critical Health Events and Unique Major Health Events appear on the Analytics & Threshold Configuration page, in the Applications section. The page lists them as Unique Critical Health Events [Baselining] and Unique Major Health Events [Baselining]. Use the toggle in the Analytics column to enable or disable each policy. For navigation to the page and the full list of metrics by section, see Analytics configuration sections and Analytics & Threshold Configuration.