Alarm Types

alarm-type
    cdb-offload-threshold-too-low
    certificate-expiration
    ha-alarm
        ha-node-down-alarm
            ha-primary-down
            ha-secondary-down
    ncs-cluster-alarm
        cluster-subscriber-failure
    ncs-dev-manager-alarm
        abort-error
        bad-user-input
        commit-through-queue-blocked
        commit-through-queue-failed
        commit-through-queue-failed-transiently
        commit-through-queue-rollback-failed
        configuration-error
        connection-failure
        final-commit-error
        missing-transaction-id
        ned-live-tree-connection-failure
        out-of-sync
        revision-error
    ncs-package-alarm
        package-load-failure
        package-operation-failure
    ncs-service-manager-alarm
        service-activation-failure
    ncs-snmp-notification-receiver-alarm
        receiver-configuration-error
    time-violation-alarm
        transaction-lock-time-violation

Alarm Type Descriptions

abort-error
  • Initial Perceived Severity major

  • Description An error happened while aborting or reverting a transaction. Device's configuration is likely to be inconsistent with the NCS CDB.

  • Recommended Action Inspect the configuration difference with compare-config, resolve conflicts with sync-from or sync-to if any.

  • Clear Condition(s) If NCS achieves sync with the device, or receives a transaction id for a netconf session towards the device, the alarm is cleared.

  • Alarm Message(s)

    • Device {dev} is locked

    • Device {dev} is southbound locked

    • abort error

alarm-type
  • Description Base identity for alarm types. A unique identification of the fault, not including the managed object. Alarm types are used to identify if alarms indicate the same problem or not, for lookup into external alarm documentation, etc. Different managed object types and instances can share alarm types. If the same managed object reports the same alarm type, it is to be considered to be the same alarm. The alarm type is a simplification of the different X.733 and 3GPP alarm IRP alarm correlation mechanisms and it allows for hierarchical extensions. A 'specific-problem' can be used in addition to the alarm type in order to have different alarm types based on information not known at design-time, such as values in textual SNMP Notification varbinds.

bad-user-input
  • Initial Perceived Severity critical

  • Description Invalid input from user. NCS cannot recognize parameters needed to connect to device.

  • Recommended Action Verify that the user supplied input are correct.

  • Clear Condition(s) This alarm is not cleared.

  • Alarm Message(s)

    • Resource {resource} doesn't exist

cdb-offload-threshold-too-low
  • Description The CDB Offload threshold configuration is set too low, causing the CDB memory footprint to reach the threshold even when there is no offloadable data present in the memory. The severity is warning.

  • Recommended Action If system memory is sufficient, increase the threshold value, otherwise increase the system memory capacity.

  • Clear Condition(s) This alarm is cleared when CDB offload can lower the CDB memory footprint below the configured threshold value.

  • Alarm Message(s)

    • Too low /config/cdb/persistence/offload/threshold value.

certificate-expiration
  • Description The certificate is nearing its expiry or has already expired. The severity depends on the time left to expiry, it ranges from warning to critical.

  • Recommended Action Replace certificate.

  • Clear Condition(s) This alarm is cleared when the certificate is no longer loaded.

  • Alarm Message(s)

    • Certificate expires in less than {days} day(s)/Certificate has expired.

cluster-subscriber-failure
  • Initial Perceived Severity critical

  • Description Failure to establish a notification subscription towards a remote node.

  • Recommended Action Verify IP connectivity between cluster nodes.

  • Clear Condition(s) This alarm is cleared if NCS succeeds to establish a subscription towards the remote node, or when the subscription is explicitly stopped.

  • Alarm Message(s)

    • Failed to establish netconf notification subscription to node ~s, stream ~s

    • Commit queue items with remote nodes will not receive required event notifications.

commit-through-queue-blocked
  • Initial Perceived Severity warning

  • Description A commit was queued behind a queue item waiting to be able to connect to one of its devices. This is potentially dangerous since one unreachable device can potentially fill up the commit queue indefinitely.

  • Clear Condition(s) An alarm raised due to a transient error will be cleared when NCS is able to reconnect to the device.

  • Alarm Message(s)

    • Commit queue item ~p is blocked because item ~p cannot connect to ~s

commit-through-queue-failed
  • Initial Perceived Severity critical

  • Description A queued commit failed.

  • Recommended Action Resolve with rollback if possible.

  • Clear Condition(s) This alarm is not cleared.

  • Alarm Message(s)

    • Failed to authenticate towards device {device}: {reason}

    • Device {dev} is locked

    • {Reason}

    • Device {dev} is southbound locked

    • Commit queue item {CqId} rollback invoked

    • Commit queue item {CqId} has failed: Operation failed because: inconsistent database

    • Remote commit queue item ~p cannot be unlocked: cluster node not configured correctly

commit-through-queue-failed-transiently
  • Initial Perceived Severity critical

  • Description A queued commit failed as it exhausted its retry attempts on transient errors.

  • Recommended Action Resolve with rollback if possible.

  • Clear Condition(s) This alarm is not cleared.

  • Alarm Message(s)

    • Failed to connect to device {dev}: {reason}

    • Connection to {dev} timed out

    • Failed to authenticate towards device {device}: {reason}

    • The configuration database is locked for device {dev}: {reason}

    • the configuration database is locked by session {id} {identification}

    • the configuration database is locked by session {id} {identification}

    • {Dev}: Device is locked in a {Op} operation by session {session-id}

    • resource denied

    • Commit queue item {CqId} rollback invoked

    • Commit queue item {CqId} has failed: Operation failed because: inconsistent database

    • Remote commit queue item ~p cannot be unlocked: cluster node not configured correctly

commit-through-queue-rollback-failed
  • Initial Perceived Severity critical

  • Description Rollback of a commit-queue item failed.

  • Recommended Action Investigate the status of the device and resolve the situation by issuing the appropriate action, i.e., service redeploy or a sync operation.

  • Clear Condition(s) This alarm is not cleared.

  • Alarm Message(s)

    • {Reason}

configuration-error
  • Initial Perceived Severity critical

  • Description Invalid configuration of NCS managed device, NCS cannot recognize parameters needed to connect to device.

  • Recommended Action Verify that the configuration parameters defined in tailf-ncs-devices.yang submodule are consistent for this device.

  • Clear Condition(s) The alarm is cleared when NCS reads the configuration parameters for the device, and is raised again if the parameters are invalid.

  • Alarm Message(s)

    • Failed to resolve IP address for {dev}

    • the configuration database is locked by session {id} {identification}

    • {Reason}

    • Resource {resource} doesn't exist

connection-failure
  • Initial Perceived Severity major

  • Description NCS failed to connect to a managed device before the timeout expired.

  • Recommended Action Verify address, port, authentication, check that the device is up and running. If the error occurs intermittently, increase connect-timeout.

  • Clear Condition(s) If NCS successfully reconnects to the device, the alarm is cleared.

  • Alarm Message(s)

    • The connection to {dev} was closed

    • Failed to connect to device {dev}: {reason}

final-commit-error
  • Initial Perceived Severity critical

  • Description A managed device validated a configuration change, but failed to commit. When this happens, NCS and the device are out of sync.

  • Recommended Action Reconcile by comparing and sync-from or sync-to.

  • Clear Condition(s) If NCS achieves sync with a device, the alarm is cleared.

  • Alarm Message(s)

    • The connection to {dev} was closed

    • External error in the NED implementation for device {dev}: {reason}

    • Internal error in the NED NCS framework affecting device {dev}: {reason}

ha-alarm
  • Description Base type for all alarms related to high availablity. This is never reported, sub-identities for the specific high availability alarms are used in the alarms.

ha-node-down-alarm
  • Description Base type for all alarms related to nodes going down in high availablity. This is never reported, sub-identities for the specific node down alarms are used in the alarms.

ha-primary-down
  • Initial Perceived Severity critical

  • Description The node lost the connection to the primary node.

  • Recommended Action Make sure the HA cluster is operational, investigate why the primary went down and bring it up again.

  • Clear Condition(s) This alarm is never automatically cleared and has to be cleared manually when the HA cluster has been restored.

  • Alarm Message(s)

    • Lost connection to primary due to: Primary closed connection

    • Lost connection to primary due to: Tick timeout

    • Lost connection to primary due to: code {Code}

ha-secondary-down
  • Initial Perceived Severity critical

  • Description The node lost the connection to a secondary node.

  • Recommended Action Investigate why the secondary node went down, fix the connectivity issue and reconnect the secondary to the HA cluster.

  • Clear Condition(s) This alarm is cleared when the secondary node is reconnected to the HA cluster.

  • Alarm Message(s)

    • Lost connection to secondary

missing-transaction-id
  • Initial Perceived Severity warning

  • Description A device announced in its NETCONF hello message that it supports the transaction-id as defined in http://tail-f.com/yang/netconf-monitoring. However when NCS tries to read the transaction-id no data is returned. The NCS check-sync feature will not work. This is usually a case of misconfigured NACM rules on the managed device.

  • Recommended Action Verify NACM rules on the concerned device.

  • Clear Condition(s) If NCS successfully reads a transaction id for which it had previously failed to do so, the alarm is cleared.

  • Alarm Message(s)

    • {Reason}

ncs-cluster-alarm
  • Description Base type for all alarms related to cluster. This is never reported, sub-identities for the specific cluster alarms are used in the alarms.

ncs-dev-manager-alarm
  • Description Base type for all alarms related to the device manager This is never reported, sub-identities for the specific device alarms are used in the alarms.

ncs-package-alarm
  • Description Base type for all alarms related to packages. This is never reported, sub-identities for the specific package alarms are used in the alarms.

ncs-service-manager-alarm
  • Description Base type for all alarms related to the service manager This is never reported, sub-identities for the specific service alarms are used in the alarms.

ncs-snmp-notification-receiver-alarm
  • Description Base type for SNMP notification receiver Alarms. This is never reported, sub-identities for specific SNMP notification receiver alarms are used in the alarms.

ned-live-tree-connection-failure
  • Initial Perceived Severity major

  • Description NCS failed to connect to a managed device using one of the optional live-status-protocol NEDs.

  • Recommended Action Verify the configuration of the optional NEDs. If the error occurs intermittently, increase connect-timeout.

  • Clear Condition(s) If NCS successfully reconnects to the managed device, the alarm is cleared.

  • Alarm Message(s)

    • The connection to {dev} was closed

    • Failed to connect to device {dev}: {reason}

out-of-sync
  • Initial Perceived Severity major

  • Description A managed device is out of sync with NCS. Usually it means that the device has been configured out of band from NCS point of view.

  • Recommended Action Inspect the difference with compare-config, reconcile by invoking sync-from or sync-to.

  • Clear Condition(s) If NCS achieves sync with a device, the alarm is cleared.

  • Alarm Message(s)

    • Device {dev} is out of sync

    • Out of sync due to no-networking or failed commit-queue commits.

    • got: ~s expected: ~s.

package-load-failure
  • Initial Perceived Severity critical

  • Description NCS failed to load a package.

  • Recommended Action Check the package for the reason.

  • Clear Condition(s) If NCS successfully loads a package for which an alarm was previously raised, it will be cleared.

  • Alarm Message(s)

    • failed to open file {file}: {str}

    • Specific to the concerned package.

package-operation-failure
  • Initial Perceived Severity critical

  • Description A package has some problem with its operation.

  • Recommended Action Check the package for the reason.

  • Clear Condition(s) This alarm is not cleared.

receiver-configuration-error
  • Initial Perceived Severity major

  • Description The snmp-notification-receiver could not setup its configuration, either at startup or when reconfigured. SNMP notifications will now be missed.

  • Recommended Action Check the error-message and change the configuration.

  • Clear Condition(s) This alarm will be cleared when the NCS is configured to successfully receive SNMP notifications

  • Alarm Message(s)

    • Configuration has errors.

revision-error
  • Initial Perceived Severity major

  • Description A managed device arrived with a known module, but too new revision.

  • Recommended Action Upgrade the Device NED using the new YANG revision in order to use the new features in the device.

  • Clear Condition(s) If all device yang modules are supported by NCS, the alarm is cleared.

  • Alarm Message(s)

    • The device has YANG module revisions not supported by NCS. Use the /devices/device/check-yang-modules action to check which modules that are not compatible.

service-activation-failure
  • Initial Perceived Severity critical

  • Description A service failed during re-deploy.

  • Recommended Action Corrective action and another re-deploy is needed.

  • Clear Condition(s) If the service is successfully redeployed, the alarm is cleared.

  • Alarm Message(s)

    • Multiple device errors: {str}

time-violation-alarm
  • Description Base type for all alarms related to time violations. This is never reported, sub-identities for the specific time violation alarms are used in the alarms.

transaction-lock-time-violation
  • Initial Perceived Severity warning

  • Description The transaction lock time exceeded its threshold and might be stuck in the critical section. This threshold is configured in /ncs-config/transaction-lock-time-violation-alarm/timeout.

  • Recommended Action Investigate if the transaction is stuck and possibly interrupt it by closing the user session which it is attached to.

  • Clear Condition(s) This alarm is cleared when the transaction has finished.

  • Alarm Message(s)

    • Transaction lock time exceeded threshold.

Last updated