Multi-Device Down Issue Incident Automation

Riverbed IQ Ops surfaces Multi-Device Down Issue Incidents when anomalies are detected in Key Measurements streaming from three (3) or more Entity type: Device (e.g. Device Status).

Note: The indicators in the Analytics & Threshold Configuration page must be enabled for the automation to function. These indicators are enabled by default.

Workflow and Processing Logic of the Multi-Device Down Issue

This section will review the parallel execution-paths present in the Automation Name: “Multi-Device Down Analysis” Runbook and provide a high-level explanation of: Initialization, the parallel paths of execution (red-tags on left), and the various logic branches (dark-tabs on right)

Automation Name: “Multi-Device Down Analysis” - Annotated

Entry: Initialization All Runbooks require a Triggering Entity as an “entry-point”. This entry-point provides a mechanism for Riverbed IQ Ops to pass all data/context gathered by the Analytics Pipeline (which generated the Incident) into the Runbook Automation. Multi-Device Down Issues are only triggered when there are three (3) or more affected Devices. For Triggering Entity: Devices (Labeled: “Device Issue”), the gathered Analytics Pipeline data/context will include Primary- and Correlated-Indicators that could include the following Key Measurements:

  1. Device Status

  2. Interface Status

  3. In Packet Error Rate

  4. Out Packet Error Rate

  5. In Packet Drops Rate

  6. Out Packet Drops Rate

  7. In Utilization

  8. Out Utilization

This “Entry-point” kicks-off four (4) parallel paths of execution.

Path-1: Initial Prioritization Immediately sets the initial Incident Priority for Multi-Device Down Issue to “High” (this may change over the course of Runbook execution).

Path-2: Impacted Locations Immediately tags the Locations of the source Entities as impacted.

Path-3: Perform Multi-Device Investigation Captures/visualizes the list of affected Devices and determines if this Incident affects critical Devices (e.g. Gateways, Firewalls, SW-WAN devices, Routers, or Multi-layer Switches), Host Servers, or lower priority Devices and then drills deeper to gather additional data/context (e.g. Devices, Applications, Users, …) needed to better assess prioritization and impacts:

  1. For critical Devices: (e.g. Is Gateway or Device Type: {Gateways, Firewalls, SW-WAN devices, Routers, or Multi-layer Switches})

    1. If the affected Device marked as Is Gateway, then escalate to Priority: Critical.

    2. Gathers available Application Flow Data:

      If there is no available Flow Data, then the Priority is set Priority: Moderate.
      1. Captures associated Application information:

        1. Tags them as Affected Applications

        2. Visualizes them as a Bar Chart.

      2. Searches for associated Client Hosts to assess potentially impacted Users:

        1. Tags them as Affected Users.

        2. Visualizes them as a Table.

        3. Sets Incident Priority according to the level of potentially impacted Users, i.e.

          • “Less than 20 users were impacted by this event.” > Set Priority To: Moderate.

          • “Less than 40 users were impacted by this event.” > Set Priority To: High.

          • “More than 40 users were impacted by this event.” > Set Priority To: Critical.

  2. For Host Servers:

    1. Since affected Device Type: Host, set Priority: High.

    2. Gathers any available Host-related Data:

      1. Converts Devices into associated Hosts:

        1. This step leverages the Subflow DevicesToHosts which maps from Devices to Hosts and passes the result back using runtime variable.

      2. Gathers Host-related Application data:

        1. Tags associated Application data as impacted.

        2. Visualizes associated Application data as Bar Chart.

      3. Gathers Host-related User data:

        1. Visualizes Server Hosts as a Table.

        2. Finds “Client-Server Pairs” to derive User data:

          1. Visualizes the associated IP Conversations as a Table.

          2. Aggregates User data by Client:

            1. Sets Incident Priority according to the level of potentially impacted Users, i.e.

              • “Less than 20 users were impacted by this event.” > Set Priority To: Moderate.

              • “Less than 40 users were impacted by this event.” > Set Priority To: High.

              • “More than 40 users were impacted by this event.” > Set Priority To: Critical.

            2. Visualizes associated Client data as Table of impacted Users.

          3. Aggregates User data by Client IP:

            1. Tags associated Client IP data as impacted Users.

  3. For lower priority Devices: Sets Incident Priority for Multi-Device Down Issue to High.

Path-4: Gather Location-to-Location Conversations Context captures/visualizes additional data/context related to the Incident:

  1. Gathers any available “Location-to-Location Conversations”:

    1. If any are found, visualize Conversations as a Table.

    2. If none are found, set Incident Priority for Multi-Device Down Issue to Low.

Location of the Multi-Device Down Issue in Riverbed IQ Ops

You can find the associated Device Down Issue Incident Runbook automation in the Riverbed IQ Ops UI. From main-menu:

  1. Mouse-over Automation, then select Automation Management to open the Automation Management page.

    • The Automation Management page contains summary-view of all supported Automations and their associated Runbooks.

  2. In the Automation Management page, in the “New Incident Triggers” area, click on the Device Down Issue panel to open a detail-view for this type of Automation.

    • Each row in this detail-view represents an Automation that can execute to investigate this type of Incident.

  3. Find the row where Automation Name: “Device Analysis”, and click on the cell where Runbook: “Device Analysis” to open this “out-of-the-box” automation in the Runbook Editor and see its constituent nodes and structure, i.e.: (refer to diagram below)

Constituent Nodes and Structure

This section explains the constituent nodes and structure of the Multi-Device Down Issue Incident Automation. The following diagram contains the automation's entire constituent nodes and structure.

The automationClosed Automated procedures that are executed as the result of a trigger. Automations consist of a single entry point and a sequence of connected nodes that define the processing logic. contains: