Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Perform operations on NSO.
Learn about NSO SSH key management.
The SSH protocol uses public key technology for two distinct purposes:
Server Authentication: This use is a mandatory part of the protocol. It allows an SSH client to authenticate the server, i.e. verify that it is really talking to the intended server and not some man-in-the-middle intruder. This requires that the client has prior knowledge of the server's public keys, and the server proves its possession of one of the corresponding private keys by using it to sign some data. These keys are normally called 'host keys', and the authentication procedure is typically referred to as 'host key verification' or 'host key checking'.
Client Authentication: This use is one of several possible client authentication methods, i.e. it is an alternative to the commonly used password authentication. The server is configured with one or more public keys which are authorized for authentication of a user. The client proves possession of one of the corresponding private keys by using it to sign some data - i.e. the exact reverse of the server authentication provided by host keys. The method is called 'public key authentication' in SSH terminology.
These two usages are fundamentally independent, i.e. host key verification is done regardless of whether the client authentication is via public key, password, or some other method. However host key verification is of particular importance when client authentication is done via password, since failure to detect a man-in-the-middle attack in this case will result in the cleartext password being divulged to the attacker.
NSO can act as an SSH server for northbound connections to the CLI or the NETCONF agent, and for connections from other nodes in an NSO cluster - cluster connections use NETCONF, and the server side setup used is the same as for northbound connections to the NETCONF agent. It is possible to use either the NSO built-in SSH server or an external server such as OpenSSH, for all of these cases. When using an external SSH server, host keys for server authentication and authorized keys for client/user authentication need to be set up per the documentation for that server, and there is no NSO-specific key management in this case.
When the NSO built-in SSH server is used, the setup is very similar to the one OpenSSH uses:
The private host key(s) must be placed in the directory specified by /ncs-config/aaa/ssh-server-key-dir
in ncs.conf
, and named either ssh_host_dsa_key
(for a DSA key) or ssh_host_rsa_key
(for a RSA key). The key(s) must be in PEM format (e.g. as generated by the OpenSSH ssh-keygen command), and must not be encrypted - protection can be achieved by file system permissions (not enforced by NSO). The corresponding public key(s) is/are typically stored in the same directory with a .pub
extension to the file name, but they are not used by NSO. The NSO installation creates a DSA private/public key pair in the directory specified by the default ncs.conf
.
The public keys that are authorized for authentication of a given user must be placed in the user's SSH directory. Refer to Public Key Login for details on how NSO searches for the keys to use.
NSO can act as an SSH client for connections to managed devices that use SSH (this is always the case for devices accessed via NETCONF, typically also for devices accessed via CLI), and for connections to other nodes in an NSO cluster. In all cases, a built-in SSH client is used. The $NCS_DIR/examples.ncs/getting-started/using-ncs/8-ssh-keys
example in the NSO example collection has a detailed walk-through of the NSO functionality that is described in this section.
The level of host key verification can be set globally via /ssh/host-key-verification
. The possible values are:
reject-unknown
: The host key provided by the device or cluster node must be known by NSO for the connection to succeed.
reject-mismatch
: The host key provided by the device or cluster node may be unknown, but it must not be different from the "known" key for the same key algorithm, for the connection to succeed.
none
: No host key verification is done - the connection will never fail due to the host key provided by the device or cluster node.
The default is reject-unknown
, and it is not recommended to use a different value, although it can be useful or needed in certain circumstances. E.g. none
maybe useful in a development scenario, and temporary use of reject-mismatch
maybe motivated until host keys have been configured for a set of existing managed devices.
The public host keys for a device that is accessed via SSH are stored in the /devices/device/ssh/host-key
list. There can be several keys in this list, one each for the ssh-ed25519
(ED25519 key), ssh-dss
(DSA key) and ssh-rsa
(RSA key) key algorithms. In case a device has entries in its live-status-protocol
list that use SSH, the host keys for those can be stored in the /devices/device/live-status-protocol/ssh/host-key
list, in the same way as the device keys - however if /devices/device/live-status-protocol/ssh
does not exist, the keys from /devices/device/ssh/host-key
are used for that protocol. The keys can be configured e.g. via input directly in the CLI, but in most cases, it will be preferable to use the actions described below to retrieve keys from the devices. These actions will also retrieve any live-status-protocol
keys for a device.
The level of host key verification can also be set per device, via /devices/device/ssh/host-key-verification
. The default is to use the global value (or default) for /ssh/host-key-verification
, but any explicitly set value will override the global value. The possible values are the same as for /ssh/host-key-verification
.
There are several actions that can be used to retrieve the host keys from a device and store them in the NSO configuration:
/devices/fetch-ssh-host-keys
: Retrieve the host keys for all devices. Successfully retrieved keys are committed to the configuration.
/devices/device-group/fetch-ssh-host-keys
: Retrieve the host keys for all devices in a device group. Successfully retrieved keys are committed to the configuration.
/devices/device/ssh/fetch-host-keys
: Retrieve the host keys for one or more devices. In the CLI, range expressions can be used for the device name, e.g. using '*' will retrieve keys for all devices, etc. The action will commit the retrieved keys if possible, i.e. if the device entry is already committed, otherwise (i.e. if the action is invoked from "configure mode" when the device entry has been created but not committed), the keys will be written to the current transaction, but not committed.
The fingerprints of the retrieved keys will be reported as part of the result from these actions, but it is also possible to ask for the fingerprints of already retrieved keys by invoking the /devices/device/ssh/host-key/show-fingerprint
action (/devices/device/live-status-protocol/ssh/host-key/show-fingerprint
for live-status protocols that use SSH).
This is very similar to the case of a connection to a managed device, it differs mainly in locations - and in the fact that SSH is always used for connection to a cluster node. The public host keys for a cluster node are stored in the /cluster/remote-node/ssh/host-key
list, in the same way as the host keys for a device. The keys can be configured e.g. via input directly in the CLI, but in most cases, it will be preferable to use the action described below to retrieve keys from the cluster node.
The level of host key verification can also be set per cluster node, via /cluster/remote-node/ssh/host-key-verification
. The default is to use the global value (or default) for /ssh/host-key-verification
, but any explicitly set value will override the global value. The possible values are the same as for /ssh/host-key-verification
.
The /cluster/remote-node/ssh/fetch-host-keys
action can be used to retrieve the host keys for one or more cluster nodes. In the CLI, range expressions can be used for the node name, e.g. using '*' will retrieve keys for all nodes, etc. The action will commit the retrieved keys if possible, but if it is invoked from "configure mode" when the node entry has been created but not committed, the keys will be written to the current transaction, but not committed.
The fingerprints of the retrieved keys will be reported as part of the result from this action, but it is also possible to ask for the fingerprints of already retrieved keys by invoking the /cluster/remote-node/ssh/host-key/show-fingerprint
action.
The private key used for public key authentication can be taken either from the SSH directory for the local user or from a list of private keys in the NSO configuration. The user's SSH directory is determined according to the same logic as for the server-side public keys that are authorized for authentication of a given user, see Public Key Login, but of course, different files in this directory are used, see below. Alternatively, the key can be configured in the /ssh/private-key
list, using an arbitrary name for the list key. In both cases, the key must be in PEM format (e.g. as generated by the OpenSSH ssh-keygen command), and it may be encrypted or not. Encrypted keys configured in /ssh/private-key
must have the passphrase for the key configured via /ssh/private-key/passphrase
.
The specific private key to use is configured via the authgroup
indirection and the umap
selection mechanisms as for password authentication, just a different alternative. Setting /devices/authgroups/group/umap/public-key
(or default-map
instead of umap
for users that are not in umap
) without any additional parameters will select the default of using a file called id_dsa
in the local user's SSH directory, which must have an unencrypted key. A different file name can be set via /devices/authgroups/group/umap/public-key/private-key/file/name
. For an encrypted key, the passphrase can be set via /devices/authgroups/group/umap/public-key/private-key/file/passphrase
, or /devices/authgroups/group/umap/public-key/private-key/file/use-password
can be set to indicate that the password used (if any) by the local user when authenticating to NSO should also be used as a passphrase for the key. To instead select a private key from the /ssh/private-key
list, the name of the key is set via /devices/authgroups/group/umap/public-key/private-key/name
.
This is again very similar to the case of a connection to a managed device, since the same authgroup
/umap
scheme is used. Setting /cluster/authgroup/umap/public-key
(or default-map
instead of umap
for users that are not in umap
) without any additional parameters will select the default of using a file called id_dsa
in the local user's SSH directory, which must have an unencrypted key. A different file name can be set via /cluster/authgroup/umap/public-key/private-key/file/name
. For an encrypted key, the passphrase can be set via /cluster/authgroup/umap/public-key/private-key/file/passphrase
, or /cluster/authgroup/umap/public-key/private-key/file/use-password
can be set to indicate that the password used (if any) by the local user when authenticating to NSO should also be used as a passphrase for the key. To instead select a private key from the /ssh/private-key
list, the name of the key is set via /cluster/authgroup/umap/public-key/private-key/name
.
Learn about NEDs, their types, and how to work with them.
Network Element Drivers, NEDs, provides the connectivity between NSO and the devices. NEDs are installed as NSO packages. For information on how to add a package for a new device type, see NSO .
To see the list of installed packages (you will not see the F5 BigIP):
The core parts of a NED are:
Data-Model: Independent of underlying device interface technology NEDs come with a data model in YANG that specifies configuration data and operational data that is supported for the device.
For native NETCONF devices, the YANG comes from the device,.
For JunOS NSO generates the model from the JunOS XML schema.
For SNMP devices NSO generates the model from the MIBs.
For CLI devices the NED designer wrote the YANG to map the CLI.
Code: For NETCONF and SNMP devices there is no code. For CLI devices there is a minimum of code managing connecting over SSH/Telnet and looking for version strings. The rest is auto-rendered from the data model.
There are four categories of NEDs depending on the device interface:
NETCONF NED: The device supports NETCONF, for example, Juniper.
CLI NED: Any device with a CLI that resembles a Cisco CLI.
Generic NED: Proprietary protocols like REST, and non-Cisco CLIs.
SNMP NED: An SNMP device.
Every device needs an auth group that tells NSO how to authenticate to the device:
The CLI snippet above shows that there is a mapping from the NSO users admin
and oper
to the remote user and password to be used on the devices. There are two options, either a mapping from the local user to the remote user or to pass the credentials. Below is a CLI example to create a new authgroup foobar
and map NSO user jim
:
This auth group will pass on joe
's credentials to the device.
There is a similar structure for SNMP devices authgroups snmp-group
that supports SNMPv1/v2c, and SNMPv3 authentication.
The SNMP auth group above has a default auth group for non-mapped users.
Make sure you know the authentication information and created authgroups as above. Also, try all information like port numbers and authentication information, and that you can read and set the configuration over for example CLI if it is a CLI NED. So if it is a CLI device try to ssh (or telnet) to the device and do show and set configuration first of all.
All devices have a admin-state
with default value southbound-locked
. This means that if you do not set this value to unlocked no commands will be sent to the device.
(See also examples.ncs/getting-started/using-ncs/2-real-device-cisco-ios
). Straightforward, adding a new device on a specific address, standard SSH port:
See also /examples.ncs/getting-started/using-ncs/3-real-device-juniper
. Make sure that NETCONF over SSH is enabled on the JunOS device:
Then you can create a NSO netconf device as:
(See also examples.ncs/snmp-ned/basic/README
.) First of all, let's explain SNMP NEDs a bit. By default all read-only objects are mapped to operational data in NSO and read-write objects are mapped to configuration data. This means that a sync-from operation will load read-write objects into NSO. How can you reach read-only objects? Note the following is true for all NED types that have modeled operational data. The device configuration exists at devices device config
and has a copy in CDB. NSO can speak live to the device to fetch for example counters by using the path devices device live-status
:
In many cases, SNMP NEDs are used for reading operational data in parallel with a CLI NED for writing and reading configuration data. More on that later.
Before trying NSO use net-snmp command line tools or your favorite SNMP Browser to try that all settings are ok.
Adding an SNMP device assuming that NED is in place:
MIB Groups are important. A MIB group is just a named collection of SNMP MIB Modules. If you do not specify any MIB group for a device, NSO will try with all known MIBs. It is possible to create MIB groups with wild cards such as CISCO*
.
Generic devices are typically configured like a CLI device. Make sure you set the right address, port, protocol, and authentication information.
Below is an example of setting up NSO with F5 BigIP:
Assume you have a Cisco device that you would like NSO to configure over CLI but read statistics over SNMP. This can be achieved by adding settings for live-device-protocol
:
Device c0
has a config tree from the CLI NED and a live-status tree (read-only) from the SNMP NED using all MIBs in the group snmp
.
Devices have an admin-state
with following values:
unlocked: the device can be modified and changes will be propagated to the real device.
southbound-locked: the device can be modified but changes will not be propagated to the real device. Can be used to prepare configurations before the device is available in the network.
locked: the device can only be read.
The admin-state value southbound-locked is the default. This means if you create a new device without explicitly setting this value configuration changes will not propagate to the network. To see default values, use the pipe target details
To analyze NED problems, turn on the tracing for a device and look at the trace file contents.
NSO pools SSH connections and trace settings are only affecting new connections so therefore any open connection must be closed before the trace setting will take effect. Now you can inspect the raw communication between NSO and the device:
If NSO fails to talk to the device the typical root causes are:
Learn basic operational scenarios and common CLI commands.
This section helps you to get started with NSO, learn basic operational scenarios, and get acquainted with the most common CLI commands.
Make sure that you have installed NSO and that you have sourced the ncsrc
file in $NCS_DIR
. This sets up the paths and environment variables to run NSO. As this must be done every time before running NSO, it is recommended to add it to your profile.
We will use the NSO network simulator to simulate three Cisco IOS routers. NSO will talk Cisco CLI to those devices. You will use the NSO CLI and Web UI to perform the tasks. Sometimes you will use the native Cisco device CLI to inspect configuration or do out-of-band changes.
Note that both the NSO software (NCS) and the simulated network devices run on your local machine.
To start the simulator:
Go to examples.ncs/getting-started/using-ncs/1-simulated-cisco-ios
. First of all, we will generate a network simulator with three Cisco devices. They will be called c0
, c1
, and c2
.
Most of this section follows the procedure in the README
file, so it is useful to have it opened as well.
Perform the following command:
This creates three simulated devices all running Cisco IOS and they will be named c0
, c1
, c2
.
Start the simulator.
Run the CLI toward one of the simulated devices.
This shows that the device has some initial configurations.
The previous step started the simulated Cisco devices. It is now time to start NSO.
The first action is to prepare directories needed for NSO to run and populate NSO with information on the simulated devices. This is all done with the ncs-setup
command. Make sure that you are in the examples.ncs/getting-started/using-ncs/1-simulated-cisco-ios
directory. (Again ignore the details for the time being).
Note the .
at the end of the command referring to the current directory. What the command does is to create directories needed for NSO in the current directory and populate NSO with devices that are running in netsim. We call this the "run-time" directory.
Start NSO.
Start the NSO CLI as the user admin
with a Cisco XR-style CLI.
NSO also supports a J-style CLI, that is started by using a -J modification to the command like this.
Throughout this user guide, we will show the commands in Cisco XR style.
At this point, NSO only knows the address, port, and authentication information of the devices. This management information was loaded to NSO by the setup utility. It also tells NSO how to communicate with the devices by using NETCONF, SNMP, Cisco IOS CLI, etc. However, at this point, the actual configuration of the individual devices is unknown.
Let us analyze the above CLI command. First of all, when you start the NSO CLI it starts in operational mode, so to show configuration data, you have to explicitly run show running-config
.
NSO manages a list of devices, each device is reached by the path devices device "name"
. You can use standard tab completion in the CLI to learn this.
The address
and port
fields tells NSO where to connect to the device. For now, they all live in local host with different ports. The device-type
structure tells NSO it is a CLI device and the specific CLI is supported by the Network Element Driver (NED) cisco-ios
. A more detailed explanation of how to configure the device-type structure and how to choose NEDs will be addressed later in this guide.
So now NSO can try to connect to the devices:
NSO does not need to have the connections active continuously, instead, NSO will establish a connection when needed and connections are pooled to conserve resources. At this time, NSO can read the configurations from the devices and populate the configuration database, CDB.
The following command will synchronize the configurations of the devices with the CDB and respond with true
if successful:
The NSO data store, CDB, will store the configuration for every device at the path devices device "name" config
, everything after this path is the configuration in the device. NSO keeps this synchronized. The synchronization is managed with the following principles:
At initialization, NSO can discover the configuration as shown above.
The modus operandi when using NSO to perform configuration changes is that the network engineer uses NSO (CLI, WebUI, REST,...) to modify the representation in NSO CDB. The changes are committed to the network as a transaction that includes the actual devices. Only if all changes happen on the actual devices, will it be committed to the NSO data store. The transaction also covers the devices so if any of the devices participating in the transaction fails, NSO will roll back the configuration changes on all modified devices. This works even in the case of devices that do not natively support roll-back like Cisco IOS CLI.
NSO can detect out-of-band changes and reconcile them by either updating the CDB or modifying the configuration on the devices to reflect the currently stored configuration.
NSO only needs to be synchronized with the devices in the event of a change being made outside of NSO. Changes made using NSO will reflected in both the CDB and the devices. The following actions do not need to be taken:
Perform configuration change via NSO.
Perform sync-from action.
The above incorrect (or not necessary) sequence stems from the assumption that the NSO CLI talks directly to the devices. This is not the case; the northbound interfaces in NSO modify the configuration in the NSO data store, NSO calculates a minimum difference between the current configuration and the new configuration, giving only the changes to the configuration to the NEDS that runs the commands to the devices. All this as one single change-set.
View the configuration of the c0
device using the command:
Or, show a particular piece of configuration from several devices:
Or, show a particular piece of configuration from all devices:
The CLI can pipe commands, try TAB after |
to see various pipe targets:
The above command shows the router config of all devices as XML and then saves it to a file router.xml
.
To change the configuration, enter configure mode.
Change or add some configuration across the devices, for example:
It is important to understand how NSO applies configuration changes to the network. At this point, the changes are local to NSO, no configurations have been sent to the devices yet. Since the NSO Configuration Database, CDB is in sync with the network, NSO can calculate the minimum diff to apply the changes to the network.
The command below compares the ongoing changes with the running database:
It is possible to dry-run the changes to see the native Cisco CLI output (in this case almost the same as above):
The changes can be committed to the devices and the NSO CDB simultaneously with a single commit. In the commit command below, we pipe to details to understand the actions being taken.
Changes are committed to the devices and the NSO database as one transaction. If any of the device configurations fail, all changes will be rolled back and the devices will be left in the state that they were in before the commit and the NSO CDB will not be updated.
There are numerous options to the commit command which will affect the behavior of the atomic transactions:
As seen by the details output, NSO stores a roll-back file for every commit so that the whole transaction can be rolled back manually. The following is an example of a rollback file:
(Viewing files as an operational command, prefixing a command in configuration mode with do
executes in operational mode.) To perform a manual rollback, first load the rollback file:
apply-rollback-file
by default restores to that saved configuration, adding selective
as a parameter allows you to just roll back the delta in that specific rollback file. Show the differences:
Commit the rollback:
A trace log can be created to see what is going on between NSO and the device CLI enable trace. Use the following command to enable trace:
Note that the trace settings only take effect for new connections, so it is important to disconnect the current connections. Make a change to, for example, c0
:
Note the use of the command commit dry-run outformat native
. This will display the net result device commands that will be generated over the native interface without actually committing them to the CDB or the devices. In addition, there is the possibility to append the reverse
flag that will display the device commands for getting back to the current running state in the network if the commit is successfully executed.
Exit from the NSO CLI and return to the Unix Shell. Inspect the CLI trace:
As seen above, ranges can be used to send configuration commands to several devices. Device groups can be created to allow for group actions that do not require naming conventions. A group can reference any number of devices. A device can be part of any number of groups, and groups can be hierarchical.
The command sequence below creates a group of core devices and a group with all devices. Note that you can use tab completion when adding the device names to the group. Also, note that it requires configuration mode. (If you are still in the Unix Shell from the steps above, do $ncs_cli -C -u admin
).
Note well the do show
which shows the operational data for the groups. Device groups have a member attribute that shows all member devices, flattening any group members.
Device groups can contain different devices as well as devices from different vendors. Configuration changes will be committed to each device in its native language without needing to be adjusted in NSO.
You can, for example, at this point use the group to check if all core
are in sync:
Assume that we would like to manage permit lists across devices. This can be achieved by defining templates and applying them to device groups. The following CLI sequence defines a tiny template, called community-list
:
This can now be applied to a device group:
What if the device group core
contained different vendors? Since the configuration is written in IOS the above template would not work on Juniper devices. Templates can be used on different device types (read NEDs) by using a prefix for the device model. The template would then look like:
The above indicates how NSO manages different models for different device types. When NSO connects to the devices, the NED checks the device type and revision and returns that to NSO. This can be inspected (note, in operational mode):
So here we see that c0
uses a tailf-ned-cisco-ios
module which tells NSO which data model to use for the device. Every NED comes with a YANG data model for the device. This renders the NSO data store (CDB) schema, the NSO CLI, WebUI, and southbound commands.
The model introduces namespace prefixes for every configuration item. This also resolves issues around different vendors using the same configuration command for different configuration elements. Note that every item is prefixed with ios
:
Another important question is how to control if the template merges the list or replaces the list. This is managed via tags. The default behavior of templates is to merge the configuration. Tags can be inserted at any point in the template. Tag values are merge
, replace
, delete
, create
and nocreate
.
Assume that c0
has the following configuration:
If we apply the template the default result would be:
We could change the template in the following way to get a result where the permit list would be replaced rather than merged. When working with tags in templates, it is often helpful to view the template as a tree rather than a command view. The CLI has a display option for showing a curly-braces tree view that corresponds to the data-model structure rather than the command set. This makes it easier to see where to add tags.
Different tags can be added across the template tree. If we now apply the template to the device c0
which already have community lists, the following happens:
Any existing values in the list are replaced in this case. The following tags are available:
merge
(default): the template changes will be merged with the existing template.
replace
: the template configuration will be replaced by the new configuration.
create
: the template will create those nodes that do not exist. If a node already exists this will result in an error.
nocreate
: the merge will only affect configuration items that already exist in the template. It will never create the configuration with this tag, or any associated commands inside it. It will only modify existing configuration structures.
delete
: delete anything from this point.
Note that a template can have different tags along the tree nodes.
A problem with the above template is that every value is hard-coded. What if you wanted a template where the community-list
name and permit-list
value are variables passed to the template when applied? Any part of a template can be a variable, (or actually an XPATH expression). We can modify the template to use variables in the following way:
The template now requires two parameters when applied (tab completion will prompt for the variable):
Note, that the replace
tag was still part of the template and it would delete any existing community lists, which is probably not the desired outcome in the general case.
The template mechanism described so far is "fire-and-forget". The templates do not have any memory of what happened to the network, or which devices they touched. A user can modify the templates without anything happening to the network until an explicit apply-template
action is performed. (Templates are of course, as all configuration changes, applied as a transaction). NSO also supports service templates that are more advanced in many ways, more information on this will be presented later in this guide.
Also, note that device templates have some additional restrictions on the values that can be supplied when applying the template. In particular, a value must either be a number or a single-quoted string. It is currently not possible to specify a value that contains a single quote (').
To make sure that configuration is applied according to site or corporate rules, you can use policies. Policies are validated at every commit, they can be of type error
that implies that the change cannot go through or a warning
which means that you have to confirm a configuration that gives a warning.
A policy is composed of:
Policy name.
Iterator: loop over a path in the model, for example, all devices, all services of a specific type.
Expression: a boolean expression that must be true for every node returned from the iterator, for example, SNMP must be turned on.
Warning or error: a message displayed to the user. If it is of type warning the user can still commit the change, if of type error the change cannot be made.
An example is shown below:
Now, if we try to delete a class-map
a
, we will get a policy violation:
The {name}
variable refers to the node set from the iterator. This node-set will be the list of devices in NSO and the devices have an attribute called 'name'.
To understand the syntax for the expressions a pipe target in the CLI can be used:
To debug policies look at the end of logs/xpath.trace
. This file will show all validated XPATH expressions and any errors.
In reality, network engineers will still modify configurations using other tools like out-of-band CLI or other management interfaces. It is important to understand how NSO manages this. The NSO network simulator supports CLI towards the devices. For example, we can use the IOS CLI on say c0
and delete a permit-list
.
From the UNIX shell, start a CLI session towards c0
.
Start the NSO CLI again:
NSO detects if its configuration copy in CDB differs from the configuration in the device. Various strategies are used depending on device support: transaction IDs, time stamps, and configuration hash-sums. For example, an NSO user can request a check-sync
operation:
NSO can also compare the configurations with the CDB and show the difference:
At this point, we can choose if we want to use the configuration stored in the CDB as the valid configuration or the configuration on the device:
In the above example, we chose to overwrite the device configuration from NSO.
NSO will also detect out-of-sync when committing changes. In the following scenario, a local c0
CLI user adds an interface. Later the NSO user tries to add an interface:
At this point, we have two diffs:
The device and NSO CDB (devices device compare-config
).
The ongoing transaction and CDB (show configuration
).
To resolve this, you can choose to synchronize the configuration between the devices and the CDB before committing. There is also an option to over-ride the out-of-sync check:
Or:
As noted before, all changes are applied as complete transactions of all configurations on all of the devices. Either all configuration changes are completed successfully or all changes are removed entirely. Consider a simple case where one of the devices is not responding. For the transaction manager, an error response from a device or a non-responding device, are both errors and the transaction should automatically rollback to the state before the commit command was issued.
Stop c0
:
Go back to the NSO CLI and perform a configuration change over c0
and c1
:
NSO sends commands to all devices in parallel, not sequentially. If any of the devices fail to accept the changes or report an error, NSO will issue a rollback to the other devices. Note, that this works also for non-transactional devices like IOS CLI and SNMP. This works even for non-symmetrical cases where the rollback command sequence is not just the reverse of the commands. NSO does this by treating the rollback as it would any other configuration change. NSO can use the current configuration and previous configuration and generate the commands needed to roll back from the configuration changes.
The diff configuration is still in the private CLI session, it can be restored, modified (if the error was due to something in the config), or in some cases, fix the device.
NSO is not a best-effort configuration management system. The error reporting coupled with the ability to completely rollback failed changes to the devices, ensures that the configurations stored in the CDB and the configurations on the devices are always consistent and that no failed or orphan configurations are left on the devices.
First of all, if the above was not a multi-device transaction, meaning that the change should be applied independently device per device, then it is just a matter of performing the commit between the devices.
Second, NSO has a commit flag commit-queue async
or commit-queue sync
. The commit queue should primarily be used for throughput reasons when doing configuration changes in large networks. Atomic transactions come with a cost, the critical section of the database is locked when committing the transaction on the network. So, in cases where there are northbound systems of NSO that generate many simultaneous large configuration changes these might get queued. The commit queue will send the device commands after the lock has been released, so the database lock is much shorter. If any device fails, an alarm will be raised.
Go to the UNIX shell, start the device, and monitor the commit queue:
Devices can also be pre-provisioned, this means that the configuration can be prepared in NSO and pushed to the device when it is available. To illustrate this, we can start by adding a new device to NSO that is not available in the network simulator:
Above, we added a new device to NSO with an IP address local host, and port 10030. This device does not exist in the network simulator. We can tell NSO not to send any commands southbound by setting the admin-state
to southbound-locked
(actually the default). This means that all configuration changes will succeed, and the result will be stored in CDB. At any point in time when the device is available in the network, the state can be changed and the complete configuration pushed to the new device. The CLI sequence below also illustrates a powerful copy configuration command that can copy any configuration from one device to another. The from and to paths are separated by the keyword to
.
As shown above, check-sync
operations will tell the user that the device is southbound locked. When the device is available in the network, the device can be synchronized with the current configuration in the CDB using the sync-to
action.
Different users or management tools can of course run parallel sessions to NSO. All ongoing sessions have a logical copy of CDB. An important case needs to be understood if there is a conflict when multiple users attempt to modify the same device configuration at the same time with different changes. First, let's look at the CLI sequence below, user admin
to the left, user joe
to the right.
There is no conflict in the above sequence, community
is a list so both joe
and admin
can add items to the list. Note that user joe
gets information about the user admin
committing.
On the other hand, if two users modify an ordered-by user list in such a way that one user rearranges the list, along with other non-conflicting modifications, and one user deletes the entire list, the following happens:
In this case, joe
commits a change to access-list
after admin
and a conflict message is displayed. Since the conflict is non-resolvable, the transaction has to be reverted. To reapply the changes made by joe
to logging
in a new transaction, the following commands are entered:
In this case, joe
tries to reapply the changes made in the previous transaction and since access-list 10
has been removed, the move command will fail when applied by the reapply-commands
command. Since the mode is best-effort
, the next command will be processed. The changes to logging
will succeed and joe
then commits the transaction.
Manage the life-cycle of network services.
NSO can also manage the life-cycle for services like VPNs, BGP peers, and ACLs. It is important to understand what is meant by service in this context:
NSO abstracts the device-specific details. The user only needs to enter attributes relevant to the service.
The service instance has configuration data itself that can be represented and manipulated.
A service instance configuration change is applied to all affected devices.
The following are the features that NSO uses to support service configuration:
Service Modeling: Network engineers can model the service attributes and the mapping to device configurations. For example, this means that a network engineer can specify at data-model for VPNs with router interfaces, VLAN ID, VRF, and route distinguisher.
Service Life-cycle: While less sophisticated configuration management systems can only create an initial service instance in the network they do not support changing or deleting a service instance. With NSO you can at any point in time modify service elements like the VLAN id of a VPN and NSO can generate the corresponding changes to the network devices.
Service Instance: The NSO service instance has configuration data that can be represented and manipulated. The service model on run-time updates all NSO northbound interfaces so that a network engineer can view and manipulate the service instance over CLI, WebUI, REST, etc.
References between Service Instances and Device Configuration: NSO maintains references between service instances and device configuration. This means that a VPN instance knows exactly which device configurations it created or modified. Every configuration stored in the CDB is mapped to the service instance that created it.
An example is the best method to illustrate how services are created and used in NSO. As described in the sections about devices and NEDs, it was said that NEDs come in packages. The same is true for services, either if you design the services yourself or use ready-made service applications, it ends up in a package that is loaded into NSO.
Watch a video presentation of this demo on .
The example examples.ncs/service-provider/mpls-vpn
will be used to explain NSO Service Management features. This example illustrates Layer-3 VPNs in a service provider MPLS network. The example network consists of Cisco ASR 9k and Juniper core routers (P and PE) and Cisco IOS-based CE routers. The Layer-3 VPN service configures the CE/PE routers for all endpoints in the VPN with BGP as the CE/PE routing protocol. The layer-2 connectivity between CE and PE routers is expected to be done through a Layer-2 ethernet access network, which is out of scope for this example. The Layer-3 VPN service includes VPN connectivity as well as bandwidth and QOS parameters.
The service configuration only has references to CE devices for the end-points in the VPN. The service mapping logic reads from a simple topology model that is configuration data in NSO, outside the actual service model and derives what other network devices to configure.
The topology information has two parts:
The first part lists connections in the network and is used by the service mapping logic to find out which PE router to configure for an endpoint. The snippets below show the configuration output in the Cisco-style NSO CLI.
The second part lists devices for each role in the network and is in this example only used to dynamically render a network map in the Web UI.
The QOS configuration in service provider networks is complex and often requires a lot of different variations. It is also often desirable to be able to deliver different levels of QOS. This example shows how a QOS policy configuration can be stored in NSO and referenced from VPN service instances. Three different levels of QOS policies are defined; GOLD
, SILVER
, and BRONZE
with different queuing parameters.
Three different traffic classes are also defined with a DSCP value that will be used inside the MPLS core network as well as default rules that will match traffic to a class.
Run the example as follows:
Make sure that you start clean, i.e. no old configuration data is present. If you have been running this or some other example before, make sure to stop any NSO or simulated network nodes (ncs-netsim) that you may have running. Output like 'connection refused (stop)' means no previous NSO was running and DEVICE ce0 connection refused (stop)...
no simulated network was running, which is good.
This will set up the environment and start the simulated network.
Before creating a new L3VPN service, we must sync the configuration from all network devices and then enter config mode. (A hint for this complete section is to have the README
file from the example and cut and paste the CLI commands).
Add another VPN.
The above sequence showed how NSO can be used to manipulate service abstractions on top of devices. Services can be defined for various purposes such as VPNs, Access Control Lists, firewall rules, etc. Support for services is added to NSO via a corresponding service package.
A service package in NSO comprises two parts:
Service model: the attributes of the service, and input parameters given when creating the service. In this example name, as-number, and end-points.
Mapping: what is the corresponding configuration of the devices when the service is applied. The result of the mapping can be inspected by the commit dry-run outformat native
command.
We later show how to define this, for now, assume that the job is done.
When NSO applies services to the network, NSO stores the service configuration along with resulting device configuration changes. This is used as a base for the FASTMAP algorithm which automatically can derive device configuration changes from a service change.
Example 1
Going back to the example L3 VPN above, any part of volvo
VPN instance can be modified.
A simple change like changing the as-number
on the service results in many changes in the network. NSO does this automatically.
Example 2
Let us look at a more challenging modification.
A common use case is of course to add a new CE device and add that as an end-point to an existing VPN. Below is the sequence to add two new CE devices and add them to the VPNs. (In the CLI snippets below we omit the prompt to enhance readability).
First, we add them to the topology:
Note well that the above just updates NSO local information on topological links. It has no effect on the network. The mapping for the L3 VPN services does a look-up in the topology connections to find the corresponding pe
router.
Next, we add them to the VPNs:
Before we send anything to the network, let's look at the device configuration using a dry run. As you can see, both new CE devices are connected to the same PE router, but for different VPN customers.
Finally, commit the configuration to the network
Next, we will show how NSO can be used to check if the service configuration in the network is up to date.
In a new terminal window, we connect directly to the device ce0
which is a Cisco device emulated by the tool ncs-netsim
.
We will now reconfigure an edge interface that we previously configured using NSO.
Going back to the terminal with NSO, check the status of the network configuration:
The CLI sequence above performs 3 different comparisons:
Real device configuration versus device configuration copy in NSO CDB.
Expected device configuration from the service perspective and device configuration copy in CDB.
Expected device configuration from the service perspective and real device configuration.
Notice that the service volvo
is out of sync with the service configuration. Use the check-sync outformat cli
to see what the problem is:
Assume that a network engineer considers the real device configuration to be authoritative:
And then restore the service:
In the same way, as NSO can calculate any service configuration change, it can also automatically delete the device configurations that resulted from creating services:
It is important to understand the two diffs shown above. The first diff as an output to show configuration
shows the diff at the service level. The second diff shows the output generated by NSO to clean up the device configurations.
Finally, we commit the changes to delete the service.
Service instances live in the NSO data store as well as a copy of the device configurations. NSO will maintain relationships between these two.
Show the configuration for a service
You can ask NSO to list all devices that are touched by a service and vice versa:
Note that operational mode in the CLI was used above. Every service instance has an operational attribute that is maintained by the transaction manager and shows which device configuration it created. Furthermore, every device configuration has backward pointers to the corresponding service instances:
The reference counter above makes sure that NSO will not delete shared resources until the last service instance is deleted. The context-match search is helpful, it displays the path to all matching configuration items.
When committing a service using the commit queue in async mode the northbound system can not rely on the service being fully activated in the network when the activation requests return.
We will now commit a VPN service using the commit queue and one device is down.
This service is not provisioned fully in the network, since ce0
was down. It will stay in the queue either until the device starts responding or when an action is taken to remove the service or remove the item. The commit queue can be inspected. As shown below we see that we are waiting for ce0
. Inspecting the queue item shows the outstanding configuration.
The commit queue will constantly try to push the configuration towards the devices. The number of retry attempts and at what interval they occur can be configured.
If we start ce0
and inspect the queue, we will see that the queue will finally be empty and that the commit-queue
status for the service is empty.
In some scenarios, it makes sense to remove the service configuration from the network but keep the representation of the service in NSO. This is called to un-deploy
a service.
To have NSO deploy services across devices, two pieces are needed:
A service model in YANG: the service model shall define the black-box view of a service; which are the input parameters given when creating the service? This YANG model will render an update of all NSO northbound interfaces, for example, the CLI.
Mapping, given the service input parameters, what is the resulting device configuration? This mapping can be defined in templates, code, or a combination of both.
Navigate to the simulated ios directory and create a new package for the VLAN service model:
If the packages
folder does not exist yet, such as when you have not run this example before, you will need to invoke the ncs-setup
and ncs-netsim create-network
commands as described in the 1-simulated-cisco-ios
README
file.
The next step is to create the template skeleton by using the ncs-make-package
utility:
This results in a directory structure:
For now, let's focus on the src/yang/vlan.yang
file.
This simple VLAN service model says:
We give a VLAN a name, for example, net-1, this must also be unique, it is specified as key
.
The VLAN has an id from 1 to 4096.
The VLAN is attached to a list of devices and interfaces. To make this example as simple as possible the interface reference is selected by picking the type and then the name as a plain string.
The good thing with NSO is that already at this point you could load the service model to NSO and try if it works well in the CLI etc. Nothing would happen to the devices since we have not defined the mapping, but this is normally the way to iterate a model and test the CLI towards the network engineers.
To build this service model cd to $NCS_DIR/examples.ncs/getting-started/using-ncs/1-simulated-cisco-ios/packages/vlan/src
and type make
(assuming you have the make
build system installed).
Go to the root directory of the simulated-ios
example:
Start netsim, NSO, and the CLI:
When starting NSO above we give NSO a parameter to reload all packages so that our newly added vlan
package is included. Packages can also be reloaded without restart. At this point we have a service model for VLANs, but no mapping of VLAN to device configurations. This is fine, we can try the service model and see if it makes sense. Create a VLAN service:
Committing service changes does not affect the devices since we have not defined the mapping. The service instance data will just be stored in NSO CDB.
Note that you get tab completion on the devices since they are leafrefs to device names in CDB, the same for interface-type since the types are enumerated in the model. However the interface name is just a string, and you have to type the correct interface name. For service models where there is only one device type like in this simple example, we could have used a reference to the ios interface name according to the IOS model. However that makes the service model dependent on the underlying device types and if another type is added, the service model needs to be updated and this is most often not desired. There are techniques to get tab completion even when the data type is a string, but this is omitted here for simplicity.
Make sure you delete the vlan
service instance as above before moving on with the example.
Now it is time to define the mapping from service configuration to actual device configuration. The first step is to understand the actual device configuration. Hard-wire the VLAN towards a device as example. This concrete device configuration is a boilerplate for the mapping, it shows the expected result of applying the service.
The concrete configuration above has the interface and VLAN hard-wired. This is what we now will make into a template instead. It is always recommended to start like the above and create a concrete representation of the configuration the template shall create. Templates are device-configuration where parts of the config are represented as variables. These kinds of templates are represented as XML files. Show the above as XML:
Now, we shall build that template. When the package was created a skeleton XML file was created in packages/vlan/templates/vlan.xml
We need to specify the right path to the devices. In our case, the devices are identified by /device-if/device-name
(see the YANG service model).
For each of those devices, we need to add the VLAN and change the specified interface configuration. Copy the XML config from the CLI and replace it with variables:
Walking through the template can give a better idea of how it works. For every /device-if/device-name
from the service model do the following:
Add the VLAN to the VLAN list, the tag merge tells the template to merge the data into an existing list (the default is to replace).
For every interface within that device, add the VLAN to the allowed VLANs and set the mode to trunk
. The tag nocreate
tells the template to not create the named interface if it does not exist
It is important to understand that every path in the template above refers to paths from the service model in vlan.yang
.
Request NSO to reload the packages:
Previously we started NCS with a reload
package option, the above shows how to do the same without starting and stopping NSO.
We can now create services that will make things happen in the network. (Delete any dummy service from the previous step first). Create a VLAN service:
When working with services in templates, there is a useful debug option for commit which will show the template and XPATH evaluation.
We can change the VLAN service:
It is important to understand what happens above. When the VLAN ID is changed, NSO can calculate the minimal required changes to the configuration. The same situation holds true for changing elements in the configuration or even parameters of those elements. In this way, NSO does not need explicit mapping to define a VLAN change or deletion. NSO does not overwrite a new configuration on the old configuration. Adding an interface to the same service works the same:
To clean up the configuration on the devices, run the delete command as shown below:
To make the VLAN service package complete edit the package-meta-data.xml to reflect the service model purpose. This example showed how to use template-based mapping. NSO also allows for programmatic mapping and also a combination of the two approaches. The latter is very flexible if some logic needs to be attached to the service provisioning that is expressed as templates and the logic applies device agnostic templates.
FASTMAP is the NSO algorithm that renders any service change from the single definition of the create
service. As seen above, the template or code only has to define how the service shall be created, NSO is then capable of defining any change from that single definition.
A limitation in the scenarios described so far is that the mapping definition could immediately do its work as a single atomic transaction. This is sometimes not possible. Typical examples are external allocation of resources such as IP addresses from an IPAM, spinning up VMs, and sequencing in general.
Nano services using Reactive FASTMAP handle these scenarios with an executable plan that the system can follow to provision the service. The general idea is to implement the service as several smaller (nano) steps or stages, by using reactive FASTMAP and provide a framework to safely execute actions with side effects.
A very common situation when we wish to deploy NSO in an existing network is that the network already has existing services implemented in the network. These services may have been deployed manually or through another provisioning system. The task is to introduce NSO and import the existing services into NSO. The goal is to use NSO to manage existing services, and to add additional instances of the same service type, using NSO. This is a non-trivial problem since existing services may have been introduced in various ways. Even if the service configuration has been done consistently it resembles the challenges of a general solution for rendering a corresponding C-program from assembler.
One of the prerequisites for this to work is that it is possible to construct a list of the already existing services. Maybe such a list exists in an inventory system, an external database, or maybe just an Excel spreadsheet. It may also be the case that we can:
Import all managed devices into NSO.
Execute a full sync-from
on the entire network.
Write a program, using Python/Maapi or Java/Maapi that traverses the entire network configuration and computes the services list.
The first thing we must do when we wish to reconcile existing services is to define the service YANG model. The second thing is to implement the service mapping logic and do it in such a way that given the service input parameters when we run the service code, they would all result in a configuration that is already there in the existing network.
The basic principles for reconciliation are:
Read the device configuration to NSO using the sync-from
action. This will get the device configuration that is a result of any existing services as well.
Instantiate the services according to the principles above.
Performing the above actions with the default behavior would not render the correct reference counters since NSO did not create the original configuration. The service activation can be run with dedicated flags to take this into account. See the NSO User Guide for a detailed process.
In many cases, a service activation solution like NSO is deployed in parallel with existing activation solutions. It is then desirable to make sure that NSO does not conflict with the device configuration rendered from the existing solution.
NSO has a commit flag that will restrict the device configuration to not overwrite data that NSO did not create: commit no-overwrite
Some services need to be set up in stages where each stage can consist of setting up some device configuration and then waiting for this configuration to take effect before performing the next stage. In this scenario, each stage must be performed in a separate transaction which is committed separately. Most often an external notification or other event must be detected and trigger the next stage in the service activation.
NSO supports the implementation of such staged services with the use of Reactive FASTMAP patterns in nano services.
From the user's perspective, it is not important how a certain service is implemented. The implementation should not have an impact on how the user creates or modifies a service. However, knowledge about this can be necessary to explain the behavior of a certain service.
In short the life-cycle of an RFM nano service in not only controlled by the direct create/set/delete operations. Instead, there are one or many implicit reactive-re-deploy
requests on the service that are triggered by external event detection. If the user examines an RFM service, e.g. using get-modification
, the device impact will grow over time after the initial create.
Nano services autonomously will do reactive-re-deploy
until all stages of the service are completed. This implies that a nano service normally is not completed when the initial create is committed. For the operator to understand that a nano service has run to completion there must typically be some service-specific operational data that can indicate this.
Plans are introduced to standardize the operational data that can show the progress of the nano service. This gives the user a standardized view of all nano services and can directly answer the question of whether a service instance has run to completion or not.
A plan consists of one or many component entries. Each component consists of two or many state entries where the state can be in status not-reached
, reached
, or failed
. A plan must have a component named self
and can have other components with arbitrary names that have meaning for the implementing nano service. A plan component must have a first state named init
and a last state named ready
. In between init
and ready
, a plan component can have additional state entries with arbitrary naming.
The purpose of the self
component is to describe the main progress of the nano service as a whole. Most importantly the self
component last state named ready
must have the status reached
if and only if the nano service as a whole has been completed. Other arbitrary components as well as states are added to the plan if they have meaning for the specific nano service i.e. more specific progress reporting.
A plan
also defines an empty leaf failed
which is set if and only if any state in any component has a status set to failed
. As such this is an aggregation to make it easy to verify if a RFM service is progressing without problems or not.
The following is an illustration of using the plan to report the progress of a nano service:
Plans were introduced to standardize the operational data that show the progress of reactive fastmap (RFM) nano services. This gives the user a standardized view of all nano services and can answer the question of whether a service instance has run to completion or not. To keep track of the progress of plans, Service Progress Monitoring (SPM) is introduced. The idea with SPM is that time limits are put on the progress of plan states. To do so, a policy and a trigger are needed.
A policy defines what plan components and states need to be in what status for the policy to be true. A policy also defines how long time it can be false without being considered jeopardized and how long time it can be false without being considered violated. Further, it may define an action, that is called in case of a policy being jeopardized, violated, or successful.
A trigger is used to associate a policy with a service and a component.
The following is an illustration of using an SPM to track the progress of an RFM service, in this case, the policy specifies that the self-components ready state must be reached for the policy to be true:
NSO only cares about the data that is in the model for the NED. The rest is ignored. See the to learn more about what is covered for the NED.
Validation scripts can also be defined in Python, see more about that in .
As described in , the commit queue can be used to increase the transaction throughput. When the commit queue is for service activation, the services will have states reflecting outstanding commit queue items.
The first step is to generate a skeleton package for a service (for details, see ). Create a directory under, for example, ~/my-sim-ios
similar to how it is done for the 1-simulated-cisco-ios/
example. Make sure that you have stopped any running NSO and netsim.
If this is your first exposure to YANG, you can see that the modeling language is very straightforward and easy to understand. See for more details and examples for YANG. The concept to understand in the above-generated skeleton is that the two lines of uses ncs:service-data
and ncs:servicepoint "vlan"
tells NSO that this is a service. The ncs:service-data
grouping together with the ncs:servicepoint
YANG extension provides the common definitions for a service. The two are implemented by the $NCS_DIR/src/ncs/yang/tailf-ncs-services.yang
. So if a user wants to create a new VLAN in the network what should be the parameters? - A very simple service model would look like below (modify the src/yang/vlan.yang
file):
The example in examples.ncs/development-guide/nano-services/netsim-sshkey
implements key generation to files and service deployment of the key to set up network elements and NSO for public key authentication to illustrate this concept. The example is described in more detail in .
Audit and verify your network for configuration compliance.
When the network configuration is broken, there is a need to gather information and verify the network. NSO has numerous functions to show different aspects of such a network configuration verification. However, to simplify this task, compliance reporting can assemble information using a selection of these NSO functions and present the resulting information in one report. This report aims to answer two fundamental questions:
Who has done what?
Is the network correctly configured?
What defines a correctly configured network? Where is the authoritative configuration kept? Naturally, NSO, with the configurations stored in CDB, is the authority. Checking the live devices against the NSO-stored device configuration is a fundamental part of compliance reporting. Compliance reporting can also be based on one or a number of stored templates which the live devices are compared against. The compliance reports can also be a combination of both approaches.
Compliance reporting can be configured to check the current situation, check historic events, or both. To assemble historic events, rollback files are used. Therefore this functionality must be enabled in NSO before report execution, otherwise, the history view cannot be presented.
The reports can be created in either plain text, HTML, or DocBook XML format. In addition, the data can also be exported to a SQLite database file. The DocBook XML format allows you to use the report in further post-processing, such as creating a PDF using Apache FOP and your own custom styling.
It is possible to create several named compliance report definitions. Each named report defines the devices, services, and/or templates that should be part of the network configuration verification.
Let us walk through a simple compliance report definition. This example is based on the examples.ncs/service-provider/mpls-vpn
example. For the details of the included services and devices in this example, see the README
file.
First of all, the reports have a name which is key in the report list. Furthermore, the report has a device-check
and a service-check
container for specifying devices and services to check. The compare-template
list allows for specifying templates to compare device configurations against.
A report definition can specify all containers at the same time:
We will first use the device-check
container to specify which devices to check. Devices can be defined in one of four different ways:
all-devices
: Check all defined devices.
device-group
: Specified list of device groups.
device
: Specified list of devices.
select-devices
: Specified by an XPath expression.
Furthermore, for a device-check
, the behavior or the verification can be specified.
The default behavior for device verification is the following:
To request a check-sync
action to verify that the device is currently in sync. This behavior is controlled by the leaf current-out-of-sync
(default true
).
To scan the commit log (i.e. rollback files) for changes on the devices and report these. This behavior is controlled by the leaf historic-changes
(default true
).
We will choose the default behavior and check all devices:
In our example, we also use the service-check
container to specify which services to check. Services can be defined in one of 3 ways:
all-services
: Check all defined services.
service
: Specified list of services.
select-services
: Specified by an XPath expression.
Also for the service-check
, the verification behavior can be specified. The default behavior for service verification is the following:
To request a check-sync
action to verify that the service is currently in sync. This behavior is controlled by the leaf current-out-of-sync
(default true
).
To scan the commit log (i.e. rollback files) for changes on the services and report these. This behavior is controlled by the leaf historic-changes
(default true
).
In our report, we choose the default behavior and check the l3vpn
service:
Our next example will illustrate how to add a device template to the compliance report. This template will be used to compare against part of the device configuration. First, we define the device template:
We will also need a device group which will be used later in the report definition. For the sake of simplicity, in this example, we will just choose some of the ce
devices:
Now, we add the template to the already-defined report gold-check
. An entry in the compare-template
list contains the combination of a template and a device group which implies that the template will be applied to all devices in the device group and the difference (if any) will be reported as a compliance violation. Note, that no data will be changed on the device. Since the device template can contain variables, each compare-template
also has a variable
list.
In our example report, we use the gold-conf
template and the mygrp
group:
Since the gold-conf
template uses variables, we will set the values for this variable in the report:
Compliance reporting is a read-only operation. When running a compliance report, the result is stored in a file located in a sub-directory compliance-reports
under the NSO state
directory. NSO has operational data for managing this report storage which makes it possible to list existing reports.
Here is an example of such a report listing:
There is also a remove
action to remove report results (and the corresponding file):
When running the report, there are a number of parameters that can be specified with the specific run
action.
The parameters that are possible to specify for a report run
action are:
title
: The title in the resulting report
from
: The date and time from which the report should start the information gathering. If not set, the oldest available information is implied.
to
: The date and time when the information gathering should stop. If not set, the current date and time are implied. If set, no new check-syncs of devices and/or services will be attempted.
outformat
: One of xml
, html
, text
, or sqlite
. If xml
is specified, the report will formatted using the docbook schema.
We will request a report run with a title
and formatted as text
.
In the above command, the report was run without a from
or a to
argument. This implies that historical information gathering will be based on all available information. This includes information gathered from rollback files.
When a from
argument is supplied to a compliance report run action, this implies that only historical information younger than the from
date and time is checked.
When a to
argument is supplied, this implies that historical information will be gathered for all logged information up to the date and time of the to
argument.
The from
and a to
arguments can be combined to specify a fixed historic time interval.
When a compliance report is run, the action will respond with a flag indicating if any discrepancies were found. Also, it reports how many devices and services have been verified in total by the report.
Below is an example of a compliance report result (in text
format):
In some cases, it is insufficient to only check that the required configuration is present, as other configurations on the device can interfere with the desired functionality. For example, a service may configure a routing table entry for the 198.51.100.0/24 network. If someone also configures a more specific entry, say 198.51.100.0/28, that entry will take precedence and may interfere with the way the service requires the traffic to be routed. In effect, this additional configuration can render the service inoperable.
To help operators ensure there is no such extraneous configuration on the managed devices, the compliance reporting feature supports the so-called strict
mode. This mode not only checks whether the required configuration is present but also reports any configuration present on the device that is not part of the template.
You can configure this mode in the report definition, when specifying the device template to check against, for example:
However, in practice, using the strict mode with device templates may prove challenging. Often, each device will have its own set of IP addresses configured. While you can supply variable values to the template, you likely need to maintain a separate set for each device (since each uses its own unique IPs).
One way to overcome this problem is to use services to configure all aspects of the managed device. But perhaps you only use NSO to configure a subset of all the services (configuration) on the devices. In this case, you can still perform generic configuration validation with the help of compliance templates, which are similar to, but separate from device templates.
With compliance templates, you use regular expressions to check compliance, instead of simple fixed values or variables, and they can be used with or without strict mode.
You can create a compliance template from scratch, similar to how you create a device template. To check that the router uses only internal DNS servers from the 10.0.0.0/8 range, you might create a compliance template such as:
Here, the value for the /sys/dns/server
must start with 10.
, followed by any string (regular expression .+
). Since a dot has a special meaning with regular expressions (any character), it must be escaped with a backslash to match only the actual dot character. But note the required multiple escaping (\\\\
) in this case.
As these expressions can be non-trivial to construct, the templates have a check
command that allows you to quickly check compliance for a set of devices, which is a great development aid.
Alternatively, you can use the /compliance/create-template
action when you already have existing device templates that you would like to use as a starting point for a compliance template.
Finally, to use compliance templates in a report, reference them from device-check/template
:
Use NSO's network simulator to simulate your network and test functionality.
The ncs-netsim
program is a useful tool to simulate a network of devices to be managed by NSO. It makes it easy to test NSO packages towards simulated devices. All you need is the NSO NED packages for the devices that you need to simulate. The devices are simulated with the Tail-f ConfD product.
All the NSO examples use ncs-netsim
to simulate the devices. A good way to learn how to use ncs-netsim
is to study them.
The ncs-netsim
tool takes any number of NED packages as input. The user can specify the number of device instances per package (device type) and a string that is used as a prefix for the name of the devices. The command takes the following parameters:
Assume that you have prepared an NSO package for a device called router
. (See the examples.ncs/getting-started/developing-with-ncs/0-router-network
example). Also, assume the package is in ./packages/router
. At this point, you can create the simulated network by:
This creates three devices; device0
, device1
, and device2
. The simulated network is stored in the ./netsim
directory. The output structure is:
There is one separate directory for every ConfD simulating the devices.
The network can be started with:
You can add more devices to the network in a similar way as it was created. E.g. if you created a network with some Juniper devices and want to add some Cisco IOS devices. Point to the NED you want to use (See {NCS_DIR}/packages/neds/
) and run the command. Remember to start the new devices after they have been added to the network.
To extract the device data from the simulated network to a file in XML format:
This data is usually used to load the simulated network into NSO. Putting the XML file in the ./ncs-cdb
folder will load it when NSO starts. If NSO is already started it can be reloaded while running.
The generated device data creates devices of the same type as the device being simulated. This is true for NETCONF, CLI, and SNMP devices. When simulating generic devices, the simulated device will run as a netconf device.
Under very special circumstances, one can choose to force running the simulation as a generic device with the option --force-generic
.
The simulated network device info can be shown with:
Here you can see the device name, the working directory, and the port number for different services to be accessed on the simulated device (NETCONF SSH, SNMP, IPC, and direct access to the CLI).
You can reach the CLI of individual devices with:
The simulated devices actually provide three different styles of CLI:
cli
: J-Style
cli-c
: Cisco XR Style
cli-i
: Cisco IOS Style
Individual devices can be started and stopped with:
You can check the status of the simulated network. Either a short version just to see if the device is running or a more verbose with all the information.
View which packages are used in the simulated network:
It is also possible to reset the network back to the state of initialization:
When you are done, remove the network:
The netsim tool includes a standard ConfD distribution and the ConfD C API library (libconfd) that the ConfD tools use. The library is built with default settings where the values for MAXDEPTH and MAXKEYLEN are 20 and 9, respectively. These values define the size of confd_hkeypath_t
struct and this size is related to the size of data models in terms of depth and key lengths. Default values should be big enough even for very large and complex data models. But in some rare cases, one or both of these values might not be large enough for a given data model.
One might observe a limitation when the data models that are used by simulated devices exceed these limits. Then it would not be possible to use the ConfD tools that are provided with the netsim. To overcome this limitation, it is advised to use the corresponding NSO tools to perform desired tasks on devices.
NSO and ConfD tools and Python APIs are basically the same except for naming, the default IPC port and the MAXDEPTH and MAXKEYLEN values, where for NSO tools, the values are set to 60 and 18, respectively. Thus, the advised solution is to use the NSO tools and NSO Python API with netsim.
E.g. Instead of using the below command:
One may use:
The README file in examples.ncs/getting-started/developing-with-ncs/0-router-network
gives a good introduction on how to use ncs-netsim
.
View currently loaded packages.
NSO Packages contain data models and code for a specific function. It might be a NED for a specific device, a service application like MPLS VPN, a WebUI customization package, etc. Packages can be added, removed, and upgraded in run-time.
The currently loaded packages can be viewed with the following command:
Thus the above command shows that NSO currently has only one package loaded, the NED package for Cisco IOS. The output includes the name and version of the package, the minimum required NSO version, the Java components included, package build details, and finally the operational status of the package. The operational status is of particular importance - if it is anything other than up
, it indicates that there was a problem with the loading or the initialization of the package. In this case an item error-info
may also be present, giving additional information about the problem. To show only the operational status for all loaded packages, this command can be used:
Manipulate and manage existing services and devices.
Devices and services are the most important entities in NSO. Once created, they may be manipulated in several different ways. The three main categories of operations that affect the state of services and devices are:
Commit Flags: Commit flags modify the transaction semantics.
Device Actions: Explicit actions that modify the devices.
Service Actions: Explicit actions that modify the services.
The purpose of this section is more of a quick reference guide, an enumeration of commonly used commands. The context in which these commands should be used is found in other parts of the documentation.
Commit flags may be present when issuing a commit
command:
Some of these flags may be configured to apply globally for all commits, under /devices/global-settings
, or per device profile, under /devices/profiles
.
Some of the more important flags are:
and-quit
: Exit to (CLI operational mode) after commit.
check
: Validate the pending configuration changes. Equivalent to validate
command (See NSO CLI ).
comment | label
: Add a commit comment/label visible in compliance reports, rollback files, etc.
dry-run
: Validate and display the configuration changes but do not perform the actual commit. Neither CDB nor the devices are affected. Instead, the effects that would have taken place are shown in the returned output. The output format can be set with the outformat
option. Possible output formats are: xml
, cli
, and native
.
The xml
format displays all changes in the whole data model. The changes will be displayed in NETCONF XML edit-config format, i.e., the edit-config that would be applied locally (at NCS) to get a config that is equal to that of the managed device.
The cli
format displays all changes in the whole data model. The changes will be displayed in CLI curly bracket format.
The native
format displays only changes under /devices/device/config
. The changes will be displayed in native device format. The native
format can be used with the reverse
option to display the device commands for getting back to the current running state in the network if the commit is successfully executed. Beware that if any changes are done later on the same data, the reverse
device commands returned are invalid.
no-networking
: Validate the configuration changes, and update the CDB but do not update the actual devices. This is equivalent to first setting the admin state to southbound locked, then issuing a standard commit. In both cases, the configuration changes are prevented from being sent to the actual devices.
If the commit implies changes, it will make the device out-of-sync.
The sync-to
command can then be used to push the change to the network.
no-out-of-sync-check
: Commit even if the device is out of sync. This can be used in scenarios where you know that the change you are doing is not in conflict with what is on the device and do not want to perform the action sync-from
first. Verify the result by using the action compare-config.
The device's sync state is assumed to be unknown after such commit and the stored last-transaction-id
value is cleared.
no-overwrite
: NSO will check that the data that should be modified has not changed on the device compared to NSO's view of the data. This is a fine-granular sync check; NSO verifies that NSO and the device are in sync regarding the data that will be modified. If they are not in sync, the transaction is aborted.
This parameter is particularly useful in brownfield scenarios where the device is always out of sync due to being directly modified by operators or other management systems.
The device's sync state is assumed to be unknown after such commit and the stored last-transaction-id
value is cleared.
no-revision-drop
: Fail if one or more devices have obsolete device models.
When NSO connects to a managed device the version of the device data model is discovered. Different devices in the network might have different versions. When NSO is requested to send configuration to devices, NSO defaults to drop any configuration that only exists in later models than the device supports. This flag forces NSO to never silently drop any data set operations towards a device.
no-deploy
: Commit without invoking the service create method, i.e., write the service instance data without activating the service(s). The service(s) can later be redeployed to write the changes of the service(s) to the network.
reconcile
: Reconcile the service data. All data which existed before the service was created will now be owned by the service. When the service is removed, that data will also be removed. In technical terms, the reference count will be decreased by one for everything that existed before the service. If manually configured data exists below in the configuration tree that data is kept unless the option discard-non-service-config
is used.
use-lsa
: Force handling of the LSA nodes as such. This flag tells NSO to propagate applicable commit flags and actions to the LSA nodes without applying them on the upper NSO node itself. The commit flags affected are dry-run
, no-networking
, no-out-of-sync-check
, no-overwrite
, and no-revision-drop
.
no-lsa
: Do not handle any of the LSA nodes as such. These nodes will be handled as any other device.
commit-queue
: Commit through the commit queue (see Commit Queue). While the configuration change is committed to CDB immediately it is not committed to the actual device but rather queued for eventual commit to increase transaction throughput. This enables the use of the commit queue feature for individual commit
commands without enabling it by default.
Possible operation modes are async
, sync
, and bypass
.
If the async
mode is set, the operation returns successfully if the transaction data has been successfully placed in the queue.
The sync
mode will cause the operation to not return until the transaction data has been sent to all devices, or a timeout occurs. If the timeout occurs the transaction data stays in the queue and the operation returns successfully. The timeout value can be specified with the timeout
or infinity
option. By default, the timeout value is determined by what is configured in /devices/global-settings/commit-queue/sync
.
The bypass
mode means that if /devices/global-settings/commit-queue/enabled-by-default
is true
, the data in this transaction will bypass the commit queue. The data will be written directly to the devices. The operation will still fail if the commit queue contains one or more entries affecting the same device(s) as the transaction to be committed.
In addition, the commit-queue
flag has a number of other useful options that affect the resulting queue item:
The tag
option sets a user-defined opaque tag that is present in all notifications and events sent referencing the queue item.
The block-others
option will cause the resulting queue item to block subsequent queue items which use any of the devices in this queue item, from being queued.
The lock
option will place a lock on the resulting queue item. The queue item will not be processed until it has been unlocked, see the actions unlock
and lock
in /devices/commit-queue/queue-item
. No following queue items, using the same devices, will be allowed to execute as long as the lock is in place.
The atomic
option sets the atomic behavior of the resulting queue item. If this is set to false
, the devices contained in the resulting queue item can start executing if the same devices in other non-atomic queue items ahead of it in the queue are completed. If set to true
, the atomic integrity of the queue item is preserved.
Depending on the selected error-option
, NSO will store the reverse of the original transaction to be able to undo the transaction changes and get back to the previous state. This data is stored in the /devices/commit-queue/completed
tree from where it can be viewed and invoked with the rollback
action. When invoked, the data will be removed. Possible values are: continue-on-error
, rollback-on-error
, and stop-on-error
.
The continue-on-error
value means that the commit queue will continue on errors. No rollback data will be created.
The rollback-on-error
value means that the commit queue item will roll back on errors. The commit queue will place a lock on the failed queue item, thus blocking other queue items with overlapping devices from being executed. The rollback
action will then automatically be invoked when the queue item has finished its execution. The lock will be removed as part of the rollback.
The stop-on-error
means that the commit queue will place a lock on the failed queue item, thus blocking other queue items with overlapping devices from being executed. The lock must then either manually be released when the error is fixed, or the rollback
action under /devices/commit-queue/completed
be invoked.
Read about error recovery in Commit Queue for a more detailed explanation.
trace-id
: Use the provided trace ID as part of the log messages emitted while processing. If no trace ID is given, NSO is going to generate and assign a trace ID to the processing.
All commands in NSO can also have pipe commands. A useful pipe command for commit is details
:
This will give feedback on the steps performed in the commit.
When working with templates, there is a pipe command debug
which can be used to troubleshoot templates. To enable debugging on all templates use:
When configuring using many templates the debug output can be overwhelming. For this reason, there is an option to only get debug information for one template, in this example, a template named l3vpn
:
Actions for devices can be performed globally on the /devices
path and for individual devices on /devices/device/name
. Many actions are also available on device groups as well as device ranges.
Service actions are performed on the service instance.
Use NSO's plug-and-play scripting mechanism to add new functionality to NSO.
A scripting mechanism can be used together with the CLI (scripting is not available for any other northbound interfaces). This section is intended for users who are familiar with UNIX shell scripting and/or programming. With the scripting mechanism, an end-user can add new functionality to NSO in a plug-and-play-like manner. No special tools are needed.
There are three categories of scripts:
command
scripts: Used to add new commands to the CLI.
policy
scripts: Invoked at validation time and may control the outcome of a transaction. Policy scripts have the mandate to cause a transaction to abort.
post-commit
scripts: Invoked when a transaction has been committed. Post-commit scripts can for example be used for logging, sending external events etc.
The terms 'script' and 'scripting' used throughout this description refer to how functionality can be added without a requirement for integration using the NSO programming APIs. NSO will only run the scripts as UNIX executables. Thus they may be written as shell scripts, or by using another scripting language that is supported by the OS, e.g., Python, or even as compiled code. The scripts are run with the same user ID as NSO.
The examples in this section are written using shell scripts as the least common denominator, but they can be written in another suitable language, e.g., Python or C.
Scripts are stored in a directory tree with a predefined structure where there is a sub-directory for each script category:
For all script categories, it suffices to just add a valid script in the correct sub-directory to enable the script. See the details for each script category for how a valid script of that category is defined. Scripts with a name beginning with a dot character ('.') are ignored.
The directory path to the location of the scripts is configured with the /ncs-config/scripts/dir
configuration parameter. It is possible to have several script directories. The sample ncs.conf
file that comes with the NSO release specifies two script directories: ./scripts
and ${NCS_DIR}/scripts
.
All scripts are required to provide a formal description of their interface. When the scripts are loaded, NSO will invoke the scripts with (one of) the following as an argument depending on the script category.
--command
--policy
--post-commit
The script must respond by writing its formal interface description on stdout
and exit normally. Such a description consists of one or more sections. Which sections are required, depends on the category of the script.
The sections do however have a common syntax. Each section begins with the keyword begin
followed by the type of section. After that one or more lines of settings follow. Each such setting begins with a name, followed by a colon character (:
), and after that the value is stated. The section ends with the keyword end
. Empty lines and spaces may be used to improve readability.
For examples see each corresponding section below.
Scripts are automatically loaded at startup and may also be manually reloaded with the CLI command script reload
. The command takes an optional verbosity
parameter which may have one of the following values:
diff
: Shows info about those scripts that have been changed since the latest (re)load. This is the default.
all
: Shows info about all scripts regardless of whether they have been changed or not.
errors
: Shows info about those scripts that are erroneous, regardless of whether they have been changed or not. Typical errors are invalid file permissions and syntax errors in the interface description.
Yet another parameter may be useful when debugging the reload of scripts:
debug
: Shows additional debug info about the scripts.
An example session reloading scripts:
Command scripts are used to add new commands to the CLI. The scripts are executed in the context of a transaction. When the script is run in oper
mode, this is a read-only transaction, when it is run in config
mode, it is a read-write transaction. In that context, the script may make use of the environment variables NCS_MAAPI_USID
and NCS_MAAPI_THANDLE
in order to attach to the active transaction. This makes it simple to make use of the ncs-maapi
command (see the ncs-maapi(1) in Manual Pages manual page) for various purposes.
Each command script must be able to handle the argument --command
and, when invoked, write a command
section to stdout
. If the CLI command is intended to take parameters, one param
section per CLI parameter must also be emitted.
The command is not paginated by default in the CLI and will only do so if it is piped to more
.
command
SectionThe following settings can be used to define a command:
modes
: Defines in which CLI mode(s) that the command should be available. The value can be oper
, config
or both (separated with space).
styles
: Defines in which CLI styles the command should be available. The value can be one or more of c
, i
and j
(separated with space). c
means Cisco style, i
means Cisco IOS, and j
J-style.
cmdpath
: Is the full CLI command path. For example, the command
path my script echo
implies that the command will be called my script echo
in the CLI.
help
: Command help text.
An example of a command
section is:
param
SectionNow let's look at various aspects of a parameter. This may both affect the parameter syntax for the end-user in the CLI as well as what the command script will get as arguments.
The following settings can be used to customize each CLI parameter:
name
: Optional name of the parameter. If provided, the CLI will prompt for this name before the value. By default, the name is not forwarded to the script. See flag
and prefix
.
type
: The type of the parameter. By default each parameter has a value, but by setting the type to void
the CLI will not prompt for a value. To be useful the void
type must be combined with name
and either flag
or prefix
.
presence
: Controls whether the parameter must be present in the CLI input or not. Can be set to optional
or mandatory
.
words
: Controls the number of words that the parameter value may consist of. By default, the value must consist of just one word (possibly quoted if it contains spaces). If set to any
, the parameter may consist of any number of words. This setting is only valid for the last parameter.
flag
: Extra argument added before the parameter value. For example, if set to -f
and the user enters logfile
, the script will get -f logfile
as arguments.
prefix
: Extra string prepended to the parameter value (as a single word). For example, if set to --file=
and the user enters logfile
, the script will get --file=logfile
as argument.
help
: Parameter help text.
If the command takes a parameter to redirect the output to a file, a param
section might look like this:
command
ExampleA command denying changes the configured trace-dir
for a set of devices, it can use the check_dir.sh
script.
Calling $NCS_DIR/examples.ncs/getting-started/using-ncs/7-scripting/scripts/command/echo.sh
with the argument --command
argument produces a command
section and a couple of param
sections:
In the complete example $NCS_DIR/examples.ncs/getting-started/using-ncs/7-scripting
, there is a README
file and a simple command script scripts/command/echo.sh
.
Policy scripts are invoked at validation time before a change is committed. A policy script can reject the data, accept it, or accept it with a warning. If a warning is produced, it will be displayed for interactive users (e.g. through the CLI or Web UI). The user may choose to abort or continue to commit the transaction.
Policy scripts are typically assigned to individual leafs or containers. In some cases, it may be feasible to use a single policy script, e.g. on the top-level node of the configuration. In such a case, this script is responsible for the validation of all values and their relationships throughout the configuration.
All policy scripts are invoked on every configuration change. The policy scripts can be configured to depend on certain subtrees of the configuration, which can save time but it is very important that all dependencies are stated and also updated when the validation logic of the policy script is updated. Otherwise, an update may be accepted even though a dependency should have denied it.
There can be multiple dependency declarations for a policy script. Each declaration consists of a dependency element specifying a configuration subtree that the validation code is dependent upon. If any element in any of the subtrees is modified, the policy script is invoked. A subtree is specified as an absolute path.
If there are no declared dependencies, the root of the configuration tree (/) is used, which means that the validation code is executed when any configuration element is modified. If dependencies are declared on a leaf element, an implicit dependency on the leaf itself is added.
Each policy script must handle the argument --policy
and, when invoked, write a policy
section to stdout
. The script must also perform the actual validation when invoked with the argument --keypath
.
policy
SectionThe following settings can be used to configure a policy script:
keypath
: Mandatory. The keypath is the path to a node in the configuration data tree. The policy script will be associated with this node. The path must be absolute. A keypath can for example be /devices/device/c0
. The script will be invoked if the configuration node, referred to by the keypath, is changed or if any node in the subtree under the node (if the node is a container or list) is changed.
dependency
: Declaration of a dependency. The dependency must be an absolute key path. Multiple dependency settings can be declared. Default is /
.
priority
: An optional integer parameter specifying the order policy scripts will be evaluated, in order of increasing priority, where a lower value is higher priority. The default priority is 0
.
call
: This optional setting can only be used if the associated node, declared as keypath
, is a list. If set to once
, the policy script is only called once even though there exists many list entries in the data store. This is useful if we have a huge amount of instances or if values assigned to each instance have to be validated in comparison with its siblings. Default is each
.
A policy that will be run for every change on or under /devices/device
.
When NSO has concluded that the policy script should be invoked to perform its validation logic, the script is invoked with the option --keypath
. If the registered node is a leaf, its value will be given with the --value
option. For example --keypath /devices/device/c0
or if the node is a leaf --keypath /devices/device/c0/address --value 127.0.0.1
.
Once the script has performed its validation logic it must exit with a proper status.
The following exit statuses are valid:
0
: Validation ok. Vote for commit.
1
: When the outcome of the validation is dubious, it is possible for the script to issue a warning message. The message is extracted from the script output on stdout. An interactive user can choose to abort or continue to commit the transaction. Non-interactive users automatically vote for commit.
2
: When the validation fails, it is possible for the script to issue an error message. The message is extracted from the script output on stdout. The transaction will be aborted.
policy
ExampleA policy denying changes the configured trace-dir
for a set of devices, it can use the check_dir.sh
script.
Trying to change that parameter would result in an aborted transaction
In the complete example $NCS_DIR/examples.ncs/getting-started/using-ncs/7-scripting/
there is a README
file and a simple policy script scripts/policy/check_dir.sh
.
Post-commit scripts are run when a transaction has been committed, but before any locks have been released. The transaction hangs until the script has returned. The script cannot change the outcome of the transaction. Post-commit scripts can for example be used for logging, sending external events etc. The scripts run as the same user ID as NSO.
The script is invoked with --post-commit
at script (re)load. In future releases, it is possible that the post-commit
section will be used for control of the post-commit scripts behavior.
At post-commit, the script is invoked without parameters. In that context, the script may make use of the environment variables NCS_MAAPI_USID
and NCS_MAAPI_THANDLE
in order to attach to the active (read-only) transaction.
This makes it simple to make use of the ncs-maapi
command. Especially the command ncs-maapi --keypath-diff /
may turn out to be useful, as it provides a listing of all updates within the transaction on a format that is easy to parse.
post-commit
SectionAll post-commit scripts must be able to handle the argument --post-commit
and, when invoked, write an empty post-commit
section to stdout
:
post-commit
ExampleAssume the administrator of a system would want to have a mail each time a change is performed on the system, a script such as mail_admin.sh
:
If the admin
then loads this script:
This configuration change will produce an email to admin@example.com
with subject NCS Mailer
and body.
In the complete example $NCS_DIR/examples.ncs/getting-started/using-ncs/7-scripting/
, there is a README
file and a simple post-commit script scripts/post-commit/show_diff.sh
.
Learn the concepts of NSO device management.
The NSO device manager is the center of NSO. The device manager maintains a flat list of all managed devices. NSO keeps the primary copy of the configuration for each managed device in CDB. Whenever a configuration change is done to the list of device configuration primary copies, the device manager will partition this network configuration change into the corresponding changes for the managed devices. The device manager passes on the required changes to the NEDs (Network Element Drivers). A NED needs to be installed for every type of device OS, like Cisco IOS NED, Cisco XR NED, Juniper JUNOS NED, etc. The NEDs communicate through the native device protocol southbound.
The NEDs fall into the following categories:
NETCONF-capable device: The Device Manager will produce NETCONF edit-config
RPC operations for each participating device.
SNMP device: The Device Manager translates the changes made to the configuration into the corresponding SNMP SET PDUs.
Device with Cisco CLI: The device has a CLI with the same structure as Cisco IOS or XR routers. The Device Manager and a CLI NED are used to produce the correct sequence of CLI commands which reflects the changes made to the configuration.
Other devices: For devices that do not fit into any of the above-mentioned categories, a corresponding Generic NED is invoked. Generic NEDs are used for proprietary protocols like REST and for CLI flavors that do not resemble IOS or XR. The Device Manager will inform the Generic NED about the made changes and the NED will translate these to the appropriate operations toward the device.
NSO orchestrates an atomic transaction that has the very desirable characteristic of either the transaction as a whole ending up on all participating devices and in the NSO primary copy, or alternatively, the whole transaction getting aborted and resultingly, all changes getting automatically rolled back.
The architecture of the NETCONF protocol is the enabling technology making it possible to push out configuration changes to managed devices and then in the case of other errors, roll back changes. Devices that do not support NETCONF, i.e., devices that do not have transactional capabilities can also participate, however depending on the device, error recovery may not be as good as it is for a proper NETCONF-enabled device.
To understand the main idea behind the NSO device manager it is necessary to understand the NSO data model and how NSO incorporates the YANG data models from the different managed devices.
The NEDs will publish YANG data models even for non-NETCONF devices. In the case of SNMP the YANG models are generated from the MIBs. For JunOS devices the JunOS NED generates a YANG from the JunOS XML Schema. For Schema-less devices like CLI devices, the NED developer writes YANG models corresponding to the CLI structure. The result of this is the device manager and NSO CDB has YANG data models for all devices independent of the underlying protocol.
Throughout this section, we will use the examples.ncs/service-provider/mpls-vpn
example. The example network consists of Cisco ASR 9k and Juniper core routers (P and PE) and Cisco IOS-based CE routers.
The central part of the NSO YANG model, in the file tailf-ncs-devices.yang
, has the following structure:
Each managed device is uniquely identified by its name, which is a free-form text string. This is typically the DNS name of the managed device but could equally well be the string format of the IP address of the managed device or anything else. Furthermore, each managed device has a mandatory address/port pair that together with the authgroup
leaf provides information to NSO on how to connect and authenticate over SSH/NETCONF to the device. Each device also has a mandatory parameter device-type
that specifies which southbound protocol to use for communication with the device.
The following device types are available:
NETCONF
CLI: A corresponding CLI NED is needed to communicate with the device. This requires YANG models with the appropriate annotations for the device CLI.
SNMP: The device speaks SNMP, preferably in read-write mode.
Generic NED: A corresponding Generic NED is needed to communicate with the device. This requires YANG models and Java code.
The NSO CLI command below lists the NED types for the devices in the example network.
The empty container /ncs:devices/device/config
is used as a mount point for the YANG models from the different managed devices.
As previously mentioned, NSO needs the following information to manage a device:
The IP/Port of the device and authentication information.
Some or all of the YANG data models for the device.
In the example setup, the address and authentication information are provided in the NSO database (CDB) initialization file. There are many different ways to add new managed devices. All of the NSO northbound interfaces can be used to manipulate the set of managed devices. This will be further described later.
Once NSO has started you can inspect the meta information for the managed devices through the NSO CLI. This is an example session:
Alternatively, this information could be retrieved from the NSO northbound NETCONF interface by running the simple Python-based netconf-console program towards the NSO NETCONF server.
All devices in the above two examples (Show Device Configuration in NSO CLI and Show Device Configuration in NETCONF) have /devices/device/state/admin-state
set to unlocked
, this will be described later in this section.
To communicate with a managed device, a NED for that device type needs to be loaded by NSO. A NED contains the YANG model for the device and corresponding driver code to talk CLI, REST, SNMP, etc. NEDs are distributed as packages.
The CLI command in the above example (Installed Packages) shows all the loaded packages. NSO loads packages at startup and can reload packages at run-time. By default, the packages reside in the packages
directory in the NSO run-time directory.
Once you have access to the network information for a managed device, its IP address and authentication information, as well as the data models of the device, you can actually manage the device from NSO.
You start the ncs
daemon in a terminal like:
Which is the same as, NSO loads it config from a ncs.conf
file
During development, it is sometimes convenient to run ncs
in the foreground as:
Once the daemon is running, you can issue the command:
To get more information about options to ncs
do:
The ncs --status
command produces a lengthy list describing for example which YANG modules are loaded in the system. This is a valuable debug tool.
The same information is also available in the NSO CLI (and thus through all available northbound interfaces, including Maapi for Java programmers)
When the NSO daemon is running and has been initialized with IP/Port and authentication information as well as imported all modules you can start to manage devices through NSO.
NSO provides the ability to synchronize the configuration to or from the device. If you know that the device has the correct configuration you can choose to synchronize from a managed device whereas if you know NSO has the correct device configuration and the device is incorrect, you can choose to synchronize from NSO to the device.
In the normal case, the configuration on the device and the copy of the configuration inside NSO should be identical.
In a cold start situation like in the mpls-vpn example, where NSO is empty and there are network devices to talk to, it makes sense to synchronize from the devices. You can choose to synchronize from one device at a time or from all devices at once. Here is a CLI session to illustrate this.
The command devices sync-from
, in example (Synchronize from Devices), is an action that is defined in the NSO data model. It is important to understand the model-driven nature of NSO. All devices are modeled in YANG, network services like MPLS VPN are also modeled in YANG, and the same is true for NSO itself. Anything that can be performed over the NSO CLI or any north-bound is defined in the YANG files. The NSO YANG files are located here:
All packages comes with YANG files as well. For example the directory packages/cisco-ios/src/yang/
contains the YANG definition of an IOS device.
The tailf-ncs.yang
is the main part of the NSO YANG data model. The file mode tailf-ncs.yang
includes all parts of the model from different files.
The actions sync-from
and sync-to
are modeled in the file tailf-ncs-devices.yang
. The sync action(s) are defined as:
Synchronizing from NSO to the device is common when a device has been configured out-of-band. NSO has no means to enforce that devices are not directly reconfigured behind the scenes of NSO; however, once an out-of-band configuration has been performed, NSO can detect the fact. When this happens it may (or may not, depending on the situation at hand) make sense to synchronize from NSO to the device, i.e. undo the rogue reconfigurations.
The command to do that is:
A dry-run
option is available for the action sync-to
.
This makes it possible to investigate the changes before they are transmitted to the devices.
sync-from
It is possible to synchronize a part of the configuration (a certain subtree) from the device using the partial-sync-from
action located under /devices. While it is primarily intended to be used by service developers as described in Partial Sync, it is also possible to use directly from the NSO CLI (or any other northbound interface). The example below (Example of Running partial-sync-from Action via CLI) illustrates using this action via CLI, using a router device from examples.ncs/getting-started/developing-with-ncs/0-router-network
.
It is now possible to configure several devices through the NSO inside the same network transaction. To illustrate this, start the NSO CLI from a terminal application.
The example above (Configure Devices) illustrates a multi-host transaction. In the same transaction, three hosts were re-configured. Had one of them failed, or been non-operational, the transaction as a whole would have failed.
As seen from the output of the command commit dry-run outformat native
, NSO generates the native CLI and NETCONF commands which will be sent to each device when the transaction is committed.
Since the /devices/device/config
path contains different models depending on the augmented device model NSO uses the data model prefix in the CLI names; ios
, cisco-ios-xr
and junos
. Different data models might use the same name for elements and the prefix avoids name clashes.
NSO uses different underlying techniques to implement the atomic transactional behavior in case of any error. NETCONF devices are straightforward using confirmed commit. For CLI devices like IOS NSO calculates the reverse diff to restore the configuration to the state before the transaction was applied.
Each managed device needs to be configured with the IP address and the port where the CLI, NETCONF server, etc. of the managed device listens for incoming requests.
Connections are established on demand as they are needed. It is possible to explicitly establish connections, but that functionality is mostly there for troubleshooting connection establishment. We can, for example, do:
We were able to connect to all managed devices. It is also possible to explicitly attempt to test connections to individual managed devices:
Established connections are typically not closed right away when not needed, but rather pooled according to the rules described in Device Session Pooling. This applies to NETCONF sessions as well as sessions established by CLI or generic NEDs via a connection-oriented protocol. In addition to session pooling, underlying SSH connections for NETCONF devices are also reused. Note that a single NETCONF session occupies one SSH channel inside an SSH connection, so multiple NETCONF sessions can co-exist in a single connection. When an SSH connection has been idle (no SSH channels open) for 2 minutes, the SSH connection is closed. If a new connection is needed later, a connection is established on demand.
Three configuration parameters can be used to control the connection establishment: connect-timeout
, read-timeout
, and write-timeout
. In the NSO data model file tailf-ncs-devices.yang
, these timeouts are modeled as:
Thus, to change these parameters (globally for all managed devices) you do:
Or, to use a profile:
When NSO connects to a managed device, it requires authentication information for that device. The authgroups
are modeled in the NSO data model:
Each managed device must refer to a named authgroup. The purpose of an authentication group is to map local users to remote users together with the relevant SSH authentication information.
Southbound authentication can be done in two ways. One is to configure the stored user and credential components as shown in the example below (Configured authgroup) and the next example (authgroup default-map). The other way is to configure a callback to retrieve user and credentials on demand as shown in the example below (authgroup-callback).
In the example above (Configured authgroup) in the auth group named default
, the two local users oper
and admin
shall use the remote users' name oper
and admin
respectively with identical passwords.
Inside an authgroup, all local users need to be enumerated. Each local user name must have credentials configured which should be used for the remote host. In centralized AAA environments, this is usually a bad strategy. You may also choose to instantiate a default-map
. If you do that it probably only makes sense to specify the same user name/password pair should be used remotely as the pair that was used to log into NSO.
In the example (Configured authgroup), only two users admin
and oper
were configured. If the default-map
in example (authgroup default-map) is configured, all local users not found in the umap
list will end up in the default-map
. For example, if the user rocky
logs in to NSO with the password secret
. Since NSO has a built-in SSH server and also a built-in HTTPS server, NSO will be able to pick up the clear text passwords and can then reuse the same password when NSO attempts to establish southbound SSH connections. The user rocky
will end up in the default-map
and when NSO attempts to propagate rocky
's changes towards the managed devices, NSO will use the remote user name rocky
with whatever password rocky
used to log into NSO.
Authenticating southbound using stored configuration has two main components to define remote user and remote credentials. This is defined by the authgroup. As for the southbound user, there exist two options, the same user logged in to NSO or another user, as specified in the authgroup. As for the credentials, there are three options.
Regular password.
Public key. This means that a private key, either from a file in the user's SSH key directory, or one that is configured in the /ssh/private-key list in the NSO configuration, is used for authentication. Refer to Publickey Authentication for the details on how the private key is selected.
Finally, an interesting option is to use the 'same-pass' option. Since NSO runs its own SSH server and its own SSL server, NSO can pick up the password of a user in clear text. Hence, if the 'same-pass' option is chosen for an authgroup, NSO will reuse the same password when attempting to connect southbound to a managed device.
In the case of authenticating southbound using a callback, remote user and remote credentials are obtained by an action invocation. The action is defined by the callback-node
and action-name
as in the example below (authgroup-callback) and supported credentials are remote password and optionally a secondary password for the provided local user, authgroup, and device.
With remote passwords, you may encounter issues if you use special characters, such as quotes ("
) and backslash (\
) in your password. See Configure Mode for recommendations on how to avoid running into password issues.
In the example above (authgroup-callback
), the configuration for the umap
entry of the oper
user is changed to use a callback to retrieve southbound authentication credentials. Thus, NSO is going to invoke the action auth-cb
defined in the callback-node callback
. The callback node is of type instance-identifier
and refers to the container called callback
defined in the example, (authgroup-callback.yang
), which includes an action defined by action-name auth-cb
and uses groupings authgroup-callback-input-params
and authgroup-callback-output-params
for input and output parameters respectively. In the example, (authgroup-callback), authgroup-callback
module was loaded in NSO within an example package. Package development and action callbacks are not described here but more can be read in Package Development, the section called DP API and Python API Overview.
Authentication groups and the functionality they bring come with some limitations on where and how it is used.
The callback option that enables authgroup-callback
feature is not applicable for members of snmp-group
list.
Generic devices that implement their own authentication scheme do not use any mapping or callback functionality provided by Authgroups.
Cluster nodes use their own authgroups and mapping model, thus functionality differs, e.g. callback option is not applicable.
Opening a session towards a managed device is potentially time and resource-consuming. Also, the probability that a recently accessed device is still subject to further requests is reasonably high. These are motives for having a managed devices session pool in NSO.
The NSO device session pool is by default active and normally needs no maintenance. However, under certain circumstances, it might be of interest to modify its behavior. Examples can be when some device type has characteristics that make session pooling undesired, or when connections to a specific device are very costly, and therefore the time that open sessions can stay in the pool should increase.
Changes from the default configuration of the NSO session pool should only be performed when absolutely necessary and when all effects of the change are understood.
NSO presents operational data that represent the current state of the session pool. To visualize this, we use the CLI to connect to NSO and force connection to all known devices:
We can now list all open sessions in the session-pool
. But note that this is a live pool. Sessions will only remain open for a certain amount of time, the idle time.
In addition to the idle time for sessions, we can also see the type of device, current number of pooled sessions, and maximum number of pooled sessions.
We can close pooled sessions for specific devices.
And we can close all pooled sessions in the session pool.
The session pool configuration is found in the tailf-ncs-devices.yang
submodel. The following part of the YANG device-profile-parameters grouping controls how the session pool is configured:
This grouping can be found in the NSO model under /ncs:devices/global-settings/session-pool
, /ncs:devices/profiles/profile/session-pool
and /ncs:devices/device/session-pool
to be able to control session pooling for all devices, a group of devices, and a specific device respectively.
In addition under /ncs:devices/global-settings/session-pool/default
it is possible to control the global max size of the session pool, as defined by the following yang snippet:
Let's illustrate the possibilities with an example configuration of the session pool:
In the above configuration, the default idle time is set to 100 seconds for all devices. A device profile called small
is defined which contains a max-session value of 3 sessions, this profile is set on all ce*
devices. The devices pe0
has a max-sessions 0 which implies that this device cannot be pooled. Let's connect all devices and see what happens in the session pool:
Now, we set an upper limit to the maximum number of sessions in the pool. Setting the value to 4 is too small for a real situation but serves the purpose of illustration:
The number of open sessions in the pool will be adjusted accordingly:
Some devices only allow a small number of concurrent sessions, in the extreme case it only allows one (for example through a terminal server). For this reason, NSO can limit the number of concurrent sessions to a device and make operations wait if the maximum number of sessions has been reached.
In other situations, we need to limit the number of concurrent connect attempts made by NSO. For example, the devices managed by NSO talk to the same server for authentication which can only handle a limited number of connections at a time.
The configuration for session limits is found in the tailf-ncs-devices.yang
submodel. The following part of the YANG device-profile-parameters grouping controls how the session limits are configured:
This grouping can be found in the NSO model under /ncs:devices/global-settings/session-limits
, /ncs:devices/profiles/profile/session-limits
and /ncs:devices/device/session-limits
to be able to control session limits for all devices, a group of devices, and a specific device respectively.
In addition, under /ncs:devices/global-settings/session-limits
, it is possible to control the number of concurrent connect attempts allowed and the maximum time to wait for a device to be available to connect.
It is possible to turn on and off NED traffic tracing. This is often a good way to troubleshoot problems. To understand the trace output, a basic prerequisite is a good understanding of the native device interface. For NETCONF devices, an understanding of NETCONF RPC is a prerequisite. Similarly for CLI NEDs, a good understanding of the CLI capabilities of the managed devices is required.
To turn on southbound traffic tracing, we need to enable the feature and we must also configure a directory where we want the trace output to be written. It is possible to have the trace output in two different formats, pretty
and raw
. The format of the data depends on the type of the managed device. For NETCONF devices, the pretty
mode indents all the XML data for enhanced readability and the raw
mode does not. Sometimes when the XML is broken, raw
mode is required to see all the data received. Tracing in raw
mode will also signal to the corresponding NED to log more verbose tracing information.
To enable tracing, do:
The trace setting only affects new NED connections, so to ensure that we get any tracing data, we can do:
The above command terminates all existing connections.
At this point, if you execute a transaction towards one or several devices and then view the trace data.
It is possible to clear all existing trace files through the command
Finally, it is worth mentioning the trace functionality does not come for free. It is fairly costly to have the trace turned on. Also, there exists no trace log wrapping functionality.
When managing large networks with NSO a good strategy is to consider the NSO copy of the network configuration to be the main primary copy. All device configuration changes must go through NSO and all other device re-configurations are considered rogue.
NSO does not contain any functionality which disallows rogue re-configurations of managed devices, however, it does contain a mechanism whereby it is a very cheap operation to discover if one or several devices have been configured out-of-band.
The underlying mechanism for cheap check-sync
is to compare time stamps, transaction IDs, hash-sums, etc., depending on what the device supports. This is in order not to have to read the full configuration to check if the NSO copy is in sync.
The transaction IDs are stored in CDB and can be viewed as:
Some of the devices do not have a transaction ID, this is the case where the NED has not implemented the cheap check-sync
mechanism. Although it is called transaction-id, the underlying value in the device can be anything to detect a config change, like for example a time-stamp.
To check for consistency, we execute:
Alternatively for all (or a subset) managed devices:
The following YANG grouping is used for the return value from the check-sync
command:
In the previous section, we described how we can easily check if a managed device is in sync. If the device is not in sync, we are interested to know what the difference is. The CLI sequence below shows how to modify ce0
out-of-band using the ncs-netsim tool. Finally, the sequence shows how to do an explicit configuration comparison.
The diff in the above output should be interpreted as: what needs to be done in NSO to become in sync with the device.
Previously in the example (Synchronize from Devices), NSO was brought in sync with the devices by fetching configuration from the devices. In this case, where the device has a rogue re-configuration, NSO has the correct configuration. In such cases, you want to reset the device configuration to what is stored inside NSO.
When you decide to reset the configuration with the copy kept in NSO use the option dry-run
in conjunction with sync-to
and inspect what will be sent to the device:
As this is the desired data to send to the device a sync-to
can now safely be performed.
The device configuration should now be in sync with the copy in NSO and compare-config
ought to yield an empty output:
There exist several ways to initialize new devices. The two common ways are to initialize a device from another existing device or to use device templates.
For example, another CE router has been added to our example network. You want to base the configuration of that host on the configuration of the managed device ce0
which has a valid configuration:
If the configuration is accurate you can create a new managed device based on that configuration as:
In the example above (Instantiate Device from Other) the commands first create the new managed device, ce9
and then populates the configuration of the new device based on the configuration of ce0
.
This new configuration might not be entirely correct, you can modify any configuration before committing it.
The above concludes the instantiation of a new managed device. The new device configuration is committed and NSO returned OK without the device existing in the network (netsim). Try to force a sync to the device:
The device is southbound locked
, this is a mode that is used where you can reconfigure a device, but any changes done to it are never sent to the managed device. This will be thoroughly described in the next section. Devices are by default created southbound locked. Default values are not shown if not explicitly requested:
Another alternative to instantiating a device from the actual working configuration of another device is to have a number of named device templates that manipulate the configuration.
The template tree looks like this:
The tree for device templates is generated from all device YANG models. All constraints are removed and the data type of all leafs is changed to string
.
A device template is created by setting the desired data in the configuration. The created device template is stored in NSO CDB.
The device template created in the example above (Create ce-initialize template) can now be used to initialize single devices or device groups, see Device Groups.
In the following CLI session, a new device ce10
is created:
Initialize the newly created device ce10
with the device template ce-initialize
:
When initializing devices, NSO does not have any knowledge about the capabilities of the device, no connect has been done. This can be overridden by the option accept-empty-capabilities
Inspect the changes made by the template ce-initialize
This section shows how device templates can be used to create and change device configurations. See Introduction in Templates for other ways of using templates.
Device templates are part of the NSO configuration. Device templates are created and changed in the tree /devices/template/config
the same way as any other configuration data and are affected by rollbacks and upgrades. Device templates can only manipulate configuration data in the /devices/device/config
tree i.e., only device data.
The $NCS_DIR/examples.ncs/service-provider/mpls-vpn
example comes with a pre-populated template for SNMP settings.
The variable $DEVICE
is used internally by NSO and can not be used in a template.
Templates can be created like any configuration data and use the CLI tab completion to navigate. Variables can be used instead of hard-coded values. In the template above the community string is a variable. The template can cover several device types/NEDs, by making use of the namespace information. This will make sure that only devices modeled with this particular namespace will be affected by this part of the template. Hence, it is possible for one template to handle a multitude of devices from various manufacturers.
A template can be applied to a device, a device group, and a range of devices. It can be used as shown in By Template to create the day-zero config for a newly created device.
Applying the snmp1
template, providing a value for the COMMUNITY
template variable:
The result of applying the template:
The default operation for templates is to merge the configuration. Tags can be added to templates to have the template merge
, replace
, delete
, create
or nocreate
configuration. A tag is inherited to its sub-nodes until a new tag is introduced.
merge
: Merge with a node if it exists, otherwise create the node. This is the default operation if no operation is explicitly set.
replace
: Replace a node if it exists, otherwise create the node.
create
: Creates a node. The node can not already exist.
nocreate
: Merge with a node if it exists. If it does not exist, it will not be created.
Example of how to set a tag:
Displaying Tags information::
By adding the CLI pipe flag debug template
when applying a template, the CLI will output detailed information on what is happening when the template is being applied:
oper-state
and admin-state
NSO differentiates between oper-state
and admin-state
for a managed device. oper-state
is the actual state of the device. We have chosen to implement a very simple oper-state
model. A managed device oper-state
is either enabled or disabled. oper-state
can be mapped to an alarm for the device. If the device is disabled, we may have additional error information. For example, the ce9
device created from another device and ce10
created with a device template in the previous section is disabled, and no connection has been established with the device, so its state is completely unknown:
Or, a slightly more interesting CLI usage:
If you manually stop a managed device, for example ce0
, NSO doesn't immediately indicate that. NSO may have an active SSH connection to the device, but the device may voluntarily choose to close its end of that (idle) SSH connection. Thus the fact that a socket from the device to NSO is closed by the managed device doesn't indicate anything. The only certain method NSO has to decide a managed device is non-operational - from the point of view of NSO - is NSO cannot SSH connect to it. If you manually stop managed device ce0
, you still have:
NSO cannot draw any conclusions from the fact that a managed device closed its end of the SSH connection. It may have done so because it decided to time out an idle SSH connection. Whereas if NSO tried to initiate any operations towards the dead device, the device would be marked as oper-state
disabled
:
Now, NSO has failed to connect to it, NSO knows that ce0
is dead:
This concludes the oper-state
discussion. The next state to be illustrated is the admin-state
. The admin-state
is what the operator configures, this is the desired state of the managed device.
In tailf-ncs.yang
we have the following configuration definition for admin-state
:
In the example above (tailf-ncs-devices.yang - admin-state), you can see the four different admin states for a managed device as defined in the YANG model.
locked
- This means that all changes to the device are forbidden. Any transaction which attempts to manipulate the configuration of the device will fail. It is still possible to read the configuration of the device.
unlocked
-This is the state a device is set into when the device is operational. All changes to the device are attempted to be sent southbound.
southbound-locked
- This is the default value. It means that it is possible to manipulate the configuration of the device but changes done to the device configuration are never pushed to the device. This mode is useful during e.g. pre-provisioning, or when we instantiate new devices.
config-locked
- This means that any transaction which attempts to manipulate the configuration of the device will fail. It is still possible to read the configuration of the device and send live-status commands or RPCs.
NSO manages a set of devices that are given to NSO through any means like CLI, inventory system integration through XML APIs, or configuration files at startup. The list of devices to manage in an overall integrated network management solution is shared between different tools and therefore it is important to keep an authoritative database of this and share it between different tools including NSO. The purpose of this part is to identify the source of the population of managed devices. The source
attribute should indicate the source of the managed device like "inventory", "manual", or "EMS".
These attributes should be automatically set by the integration towards the inventory source, rather than manipulated manually.
added-by-user
: Identify the user who loaded the managed device.
context
: In what context was the device loaded.
when
: When the device was added to NSO.
from-ip
: From which IP the load activity was run.
source
: Identify the source of the managed device such as the inventory system name or the name of the source file.
The NETCONF protocol mandates that the first thing both the server and the client have to do is to send its list of NETCONF capabilities in the <hello>
message. A capability indicates what the peer can do. For example the validate:1.0
indicates that the server can validate a proposed configuration change, whereas the capability http://acme.com/if
indicates the device implements the http://acme.com
proprietary capability.
The NEDs report the capabilities of the devices at connection time. The NEDs also load the YANG modules for NSO. For a NETCONF/YANG device, all this is straightforward, for non-NETCONF devices the NEDs do the translation.
The capabilities announced by a device also contain the YANG version 1 modules supported. In addition to this, YANG version 1.1 modules are advertised in the YANG library module on the device. NSO checks both the capabilities and the YANG library to find out which YANG modules a device supports.
The capabilities and modules detected by NSO are available in two different lists, /devices/device/capability
and devices/device/module
. The capability
list contains all capabilities announced and all YANG modules in the YANG library. The module
list contains all YANG modules announced that are also supported by the NED in NSO.
NSO can be used to handle all or some of the YANG configuration modules for a device. A device may announce several modules through its capability list which NSO ignores. NSO will only handle the YANG modules for a device which are loaded (and compiled through ncsc --ncs-compile-bundle
) or ncsc --ncs-compile-module
) all other modules for the device are ignored. If you require a situation where NSO is entirely responsible for a device so that complete device backup/configurations are stored in NSO you must ensure NSO indeed has support for all modules for the device. It is not possible to automate this process since a capability URI doesn't necessarily indicate actual configuration.
When a device is added to NSO its NED ID must be set. For a NETCONF device, it is possible to configure the generic NETCONF NED id netconf
(defined in the YANG module tailf-ncs-ned
). If this NED ID is configured, we can then ask NSO to connect to the device and then check the capability
list to see which modules this device implements.
We can also check which modules the loaded NEDs support. Then we can pick the most suitable NED and configure the device with this NED ID.
NSO works best if the managed devices support the NETCONF candidate configuration datastore. However, NSO reads the capabilities of each managed device and executes different sequences of NETCONF commands towards different types of devices.
For implementations of the NETCONF protocol that do not support the candidate datastore, and in particular, devices that do not support NETCONF commit with a timeout, NSO tries to do the best of the situation.
NSO divides devices into the following groups.
start_trans_running
: This mode is used for devices that support the Tail-f proprietary transaction extension defined by http://tail-f.com/ns/netconf/transactions/1.0
. Read more on this in the Tail-f ConfD user guide. In principle it's a means to - over the NETCONF interface - control transaction processing towards the running data store. This may be more efficient than going through the candidate data store. The downside is that it is Tail-f proprietary non-standardized technology.
lock_candidate
: This mode is used for devices that support the candidate data store but disallow direct writes to the running data store.
lock_reset_candidate
: This mode is used for devices that support the candidate data and also allow direct writes to the running data store. This is the default mode for Tail-f ConfD NETCONF server. Since the running data store is configurable, we must, before each configuration attempt, copy all of the running to the candidate. (ConfD has optimized this particular usage pattern, so this is a very cheap operation for ConfD)
startup
: This mode is used for devices that have writable running, no candidate but do support the startup data store. This is the typical mode for Cisco-like devices.
running-only
: This mode is used for devices that only support writable running.
NED
: The transaction is controlled by a Network Element Driver. The exact transaction mode depends on the type of the NED.
Which category NSO chooses for a managed device depends on which NETCONF capabilities the device sends to NSO in its NETCONF hello message. You can see in the CLI what NSO has decided for a device as in:
NSO talking to ConfD device running in its standard configuration, thus lock-reset-candidate
.
Another important discriminator between managed devices is whether they support the confirmed commit with a timeout capability, i.e., the confirmed-commit:1.0
standard NETCONF capability. If a device supports this capability, NSO utilizes it. This is the case with for example Juniper routers.
If a managed device does not support this capability, NSO attempts to do the best it can.
This is how NSO handles common failure scenarios:
The operator aborts the transaction, or the NSO loses the SSH connection to another managed device which is also participating in the same network transaction. If the device does support the confirmed-commit
capability, NSO aborts the outstanding yet-uncommitted transaction simply by closing the SSH connection. When the device does not support the confirmed-commit
capability, NSO has the reverse diff and simply sends the precise undo information to the device instead.
The device rejects the transaction in the first place, i.e. the NSO attempts to modify its running data store. This is an easy case since NSO then simply aborts the transaction as a whole in the initial commit confirmed [time]
attempt.
NSO loses SSH connectivity to the device during the timeout period. This is a real error case and the configuration is now in an unknown state. NSO will abort the entire transaction, but the configuration of the failing managed device is now probably in error. The correct procedure once network connectivity has been restored to the device is to sync it in the direction from NSO to the device. The NSO copy of the device configuration will be what was configured before the failed transaction.
Thus, even if not all participating devices have first-class NETCONF server implementations, NSO will attempt to fake the confirmed-commit
capability.
When the managed device defines top-level NETCONF RPCs or alternatively, define tailf:action
points inside the YANG model, these RPCs and actions are also imported into the data model that resides in NSO.
For example, the Juniper NED comes with a set of JunOS RPCs defined in: $NCS_DIR/packages/neds/juniper-junos/src/yang/junos-rpc.yang
Thus, since all RPCs and actions from the devices are accessible through the NSO data model, these actions are also accessible through all NSO northbound APIs, REST, JAVA MAAPI, etc. Hence it is possible to - from user scripts/code - invoke actions and RPCs on all managed devices. The RPCs are augmented below an RPC container:
In the simulated environment of the mpls-vpn example, these RPCs might not have been implemented.
The NSO device manager has a concept of groups of devices. A group is nothing more than a named group of devices. What makes this interesting is that we can invoke several different actions in the group, thus implicitly invoking the action on all members in the group. This is especially interesting for the apply-template
action.
The definition of device groups resides at the same layer in the NSO data model as the device list, thus we have:
The MPLS VPN example comes with a couple of pre-defined device-groups:
Device groups are created like below:
Device groups can reference other device groups. There is an operational attribute that flattens all members in the group. The CLI sequence below adds the PE
group to my-group
. Then it shows the configuration of that group followed by the status of this group. The status for the group contains a members
attribute that lists all device members.
Once you have a group, you can sync and check-sync the entire group.
However, what makes device groups really interesting is the ability to apply a template to a group. You can use the pre-populated templates to apply SNMP settings to device groups.
Policies allow you to specify network-wide constraints that always must be true. If someone tries to apply a configuration change over any northbound interface that would be evaluated to false, the configuration change is rejected by NSO. Policies can be of type warning means that it is possible to override them, or error which cannot be overridden.
Assume you would like to enforce all CE routers to have a Gigabit interface 0/1
.
As seen in the example above (Policies) , a policy rule has (an optional) for each statement and a mandatory expression and error message. The foreach
statement evaluates to a node set, and the expression is then evaluated on each node. So in this example, the expression would be evaluated for every device in NSO which begins with ce. The name variable in the warning message refers to a leaf available from the for-each node set.
Validation is always performed at commit but can also be requested interactively.
Note any configuration can be activated or deactivated. This means that to temporarily turn off a certain policy you can deactivate it. Note also that if the configuration was changed by any other means than NSO by local tools to the device like a CLI, a devices sync-from
operation might fail if the device configuration violates the policy.
One of the strengths of NSO is the concept of network-wide transactions. When you commit data to NSO that spans multiple devices in the /ncs:devices/device
tree, NSO will - within the NSO transaction - commit the data on all devices or none, keeping the network consistent with CDB. The NSO transaction doesn't return until all participants have acknowledged the proposed configuration change. The downside of this is that the slowest device in each transaction limits the overall transactional throughput in NSO. Such things as out-of-sync checks, network latency, calculation of changes sent southbound, or device deficiencies all affect the throughput.
Typically when automation software north of NSO generates network change requests it may very well be the case more requests arrive than what can be handled. In NSO deployment scenarios where you wish to have higher transactional throughput than what is possible using network-wide transactions, you can use the commit queue instead. The goal of the commit queue is to increase the transactional throughput of NSO while keeping an eventual consistency view of the database. With the commit queue, NSO will compute the configuration change for each participating device, put it in an outbound queue item, and immediately return. The queue is then independently run.
Another use case where you can use the commit queue is when you wish to push a configuration change to a set of devices and don't care about whether all devices accept the change or not. You do not want the default behavior for transactions which is to reject the transaction as a whole if one or more participating devices fail to process its part of the transaction.
An example of the above could be if you wish to set a new NTP server on all managed devices in our entire network, if one or more devices currently are non-operational, you still want to push out the change. You also want the change automatically pushed to the non-operational devices once they go live again.
The big upside of this scheme is that the transactional throughput through NSO is considerably higher. Also, transient devices are handled better. The downsides are:
If a device rejects the proposed change, NSO and the device are now out of sync until any error recovery is performed. Whenever this happens, an NSO alarm (called commit-through-queue-failed) is generated.
While a transaction remains in the queue, i.e., it has been accepted for delivery by NSO but is not yet delivered, the view of the network in NSO is not (yet) correct. Eventually, though, the queued item will be delivered, thus achieving eventual consistency.
To facilitate the two use cases of the commit queue the outbound queue item can be either in an atomic or non-atomic mode.
In atomic mode the outbound queue item will push all configuration changes concurrently once there are no intersecting devices ahead in the queue. If any device rejects the proposed change, all device configuration changes in the queue item will be rejected as a whole, leaving the network in a consistent state. The atomic mode also allows for automatic error recovery to be performed by NSO.
In the non-atomic mode, the outbound queue item will push configuration changes for a device whenever all occurrences of it are completed or it doesn't exist ahead in the queue. The drawback to this mode is that there is no automatic error recovery that can be performed by NSO.
In the following sequences, the simulated device ce0
is stopped to illustrate the commit queue. This can be achieved by the following sequence including returning to the NSO CLI config mode:
By default, the commit queue is turned off. You can configure NSO to run a transaction, device, or device group through the commit queue in a number of different ways, either by providing a flag to the commit
command as:
Or, by configuring NSO to always run all transactions through the commit queue as in:
Or, by configuring a number of devices to run through the commit queue as default:
When enabling the commit queue as default on a per device/device group basis, an NSO transaction will compute the configuration change for each participating device, put the devices enabled for the commit queue in the outbound queue, and then proceed with the normal transaction behavior for those devices not commit queue enabled. The transaction will still be successfully committed even if some of the devices added to the outbound queue will fail. If the transaction fails in the validation phase the entire transaction will be aborted, including the configuration change for those devices added to the commit queue. If the transaction fails after the validation phase, the configuration change for the devices in the commit queue will still be delivered.
Do some changes and commit through the commit queue:
In the example above (Commit through Commit Queue), the commit affected three devices, ce0
, ce1
and ce2
. If you immediately would have launched yet another transaction, as in the second one (see example below), manipulating an interface of ce2
, that transaction would have been queued instead of immediately launched. The idea here is to queue entire transactions that touch any device that has anything queued ahead in the queue.
Each transaction committed through the queues becomes a queue item. A queue item has an ID number. A bigger number means that it's scheduled later. Each queue item waits for something to happen. A queue item is in either of three states.
waiting
: The queue item is waiting for other queue items to finish. This is because the waiting queue item has participating devices that are part of other queue items, ahead in the queue. It is waiting for a set of devices, to not occur ahead of itself in the queue.
executing
: The queue item is currently being processed. Multiple queue items can run currently as long as they don't share any managed devices. Transient errors might be present. These errors occur when NSO fails to communicate with some of the devices. The errors are shown in the leaf-list transient-errors
. Retries will take place at intervals specified in /ncs:devices/global-settings/commit-queue/retry-timeout
. Examples of transient errors are connection failures and that the changes are rejected due to the device being locked. Transient errors are potentially bad since the queue might grow if new items are added, waiting for the same device.
locked
: This queue item is locked and will not be processed until it has been unlocked, see the action /ncs:devices/commit-queue/queue-item/unlock
. A locked queue item will block all subsequent queue items that are using any device in the locked queue item.
You can view the queue in the CLI. There are three different view modes, summary
, normal
, and detailed
. Depending on the output, both the summary
and the normal
look good:
The age
field indicated how many seconds a queue item has been in the queue.
You can also view the queue items in detailed mode:
The queue items are stored persistently, thus if NSO is stopped and restarted, the queue remains the same. Similarly, if NSO runs in HA (High Availability) mode, the queue items are replicated, ensuring the queue is processed even in case of failover.
The commit queue is disabled when both HA is enabled, and its HA role is none
, i.e., not primary
or secondary
. See Mode of Operation.
A number of useful actions are available to manipulate the queue:
devices commit-queue add-lock device [ ... ]
. This adds a fictive queue item to the commit queue. Any queue item, affecting the same devices, which is entering the commit queue will have to wait for this lock item to be unlocked or deleted. If no devices are specified, all devices in NSO are locked.
devices commit-queue clear
. This action clears the entire queue. All devices present in the commit queue will, after this action, have executed be out of sync. The clear
action is a rather blunt tool and is not recommended to be used in any normal use case.
devices commit-queue prune device [ ... ]
. This action prunes all specified devices from all queue items in the commit queue. The affected devices will, after this action has been executed, be out of sync. Devices that are currently being committed to will not be pruned unless the force
option is used. Atomic queue items will not be affected, unless all devices in it are pruned. The force
option will brutally kill an ongoing commit. This could leave the device in a bad state. It is not recommended in any normal use case.
devices commit-queue set-atomic-behaviour atomic [ true,false ]
. This action sets the atomic behavior of all queue items. If these are set to false, the devices contained in these queue items can start executing if the same devices in other non-atomic queue items ahead of it in the queue are completed. If set to true, the atomic integrity of these queue items is preserved.
devices commit-queue wait-until-empty
. This action waits until the commit queue is empty. The default is to wait infinity
. A timeout
can be specified to wait for a number of seconds. The result is empty
if the queue is empty or timeout
if there are still items in the queue to be processed.
devices commit-queue queue-item [ id ] lock
. This action puts a lock on an existing queue item. A locked queue item will not start executing until it has been unlocked.
devices commit-queue queue-item [ id ] unlock
. This action unlocks a locked queue item. Unlocking a queue item that is not locked is silently ignored.
devices commit-queue queue-item [ id ] delete
. This action deletes a queue item from the queue. If other queue items are waiting for this (deleted) item, they will all automatically start to run. The devices of the deleted queue item will, after the action has been executed, be out of sync if they haven't started executing. Any error option set for the queue item will also be disregarded. The force
option will brutally kill an ongoing commit. This could leave the device in a bad state. It is not recommended in any normal use case.
devices commit-queue queue-item [ id ] prune device [ ... ]
. This action prunes the specified devices from the queue item. Devices that are currently being committed to will not be pruned unless the force
option is used. Atomic queue items will not be affected, unless all devices in it are pruned. The force
option will brutally kill an ongoing commit. This could leave the device in a bad state. It is not recommended in any normal use case.
devices commit-queue queue-item [ id ] set-atomic-behaviour atomic [ true,false ]
. This action sets the atomic behavior of this queue item. If this is set to false, the devices contained in this queue item can start executing if the same devices in other non-atomic queue items ahead of it in the queue are completed. If set to true, the atomic integrity of the queue item is preserved.
devices commit-queue queue-item [ id ] wait-until-completed
. This action waits until the queue item is completed. The default is to wait infinity
. A timeout
can be specified to wait for a number of seconds. The result is completed
if the queue item is completed or timeout
if the timer expired before the queue item was completed.
devices commit-queue queue-item [ id ] retry
. This action retries devices with transient errors instead of waiting for the automatic retry attempt. The device
option will let you specify the devices to retry.
A typical use scenario is where one or more devices are not operational. In the example above (Viewing Queue Items), there are two queue items, waiting for the device ce0
to come alive. ce0
is listed as a transient error, and this is blocking the entire queue. Whenever a queue item is blocked because another item ahead of it cannot connect to a specific managed device, an alarm is generated:
Block other affecting device ce0
from entering the commit queue:
Now queue item 9577950918
is blocking other items using ce0
from entering the queue.
Prune the usage of the device ce0
from all queue items in the commit queue:
The lock will be in the queue until it has been deleted or unlocked. Queue items affecting other devices are still allowed to enter the queue.
Fix the problem with the device ce0
, remove the lock item and sync from the device:
In an LSA cluster, each remote NSO has its own commit queue. When committing through the commit queue on the upper node NSO will automatically create queue items on the lower nodes where the devices in the transaction reside. The progress of the lower node queue items is monitored through a queue item on the upper node. The remote NSO is treated as a device in the queue item and the remote queue items and devices are opaque to the user of the upper node.
Generally, it is not recommended to interfere with the queue items of the lower nodes that have been created by an upper NSO. This can cause the upper queue item to not synchronize with the lower ones correctly.
To be able to track the commit queue on the lower cluster nodes, NSO uses the built-in stream ncs-events
that generates northbound notifications for internal events. This stream is required if running the commit queue in a clustered scenario. It is enabled in ncs.conf
:
In addition, the commit queue needs to be enabled in the cluster configuration.
For more detailed information on how to set up clustering, see LSA Overview.
The goal of the commit queue is to increase the transactional throughput of NSO while keeping an eventual consistency view of the database. This means no matter if changes committed through the commit queue originate as pure device changes or as the effect of service manipulations the effects on the network should eventually be the same as if performed without a commit queue no matter if they succeed or not. This should apply to a single NSO node as well as NSO nodes in an LSA cluster.
Depending on the selected error-option
NSO will store the reverse of the original transaction to be able to undo the transaction changes and get back to the previous state. This data is stored in the /ncs:devices/commit-queue/completed
tree from where it can be viewed and invoked with the rollback
action. When invoked the data will be removed.
The error option can be configured under /ncs:devices/global-settings/commit-queue/error-option
. Possible values are: continue-on-error
, rollback-on-error
, and stop-on-error
. The continue-on-error
value means that the commit queue will continue on errors. No rollback data will be created. The rollback-on-error
value means that the commit queue item will roll back on errors. The commit queue will place a lock on the failed queue item, thus blocking other queue items with overlapping devices from being executed. The rollback
action will then automatically be invoked when the queue item has finished its execution. The lock will be removed as part of the rollback. The stop-on-error
means that the commit queue will place a lock on the failed queue item, thus blocking other queue items with overlapping devices from being executed. The lock must then either manually be released when the error is fixed or the rollback
action under /devices/commit-queue/completed
be invoked. The rollback
action is as:
The error option can also be given as a commit parameter.
To guarantee service integrity NSO checks for overlapping service or device modifications against the items in the commit queue and returns an error if such exists. If a service instance does a shared set on the same data as a service instance in the queue actually changed, the reference count will be increased but no actual change is pushed to the device(s). This will give a false positive that the change is actually deployed in the network. The rollback-on-error
and stop-on-error
error options will automatically create a queue lock on the involved services and devices to prevent such a case.
In a clustered environment, different parts of the resulting configuration change set will end up on different lower nodes. This means on some nodes the queue item could succeed and on others, it could not.
The error option in a cluster environment will originate on the upper node. The reverse of the original transaction will be committed on this node and propagated through the cluster down to the lower nodes. The net effect of this is the state of the network will be the same as before the original change.
As the error option in a cluster environment will originate on the upper node, any configuration on the lower nodes will be meaningless.
When NSO is recovering from a failed commit, the rollback data of the failed queue items in the cluster is applied and committed through the commit queue. In the rollback, the no-networking flag will be set on the commits towards the failed lower nodes or devices to get CDB consistent with the network. Towards the successful nodes or devices, the commit is done as before. This is what the rollback
action in /ncs:devices/commit-queue/completed/queue-item
does.
TR1; service s1
creates ce0:a
and ce1:b
. The nodes a
and b
are created in CDB. In the changes of the queue item, CQ1
, a
and b
are created.
TR2; service s2
creates ce1:c
and ce2:d
. The nodes c
and d
are created in CDB. In the changes of the queue item, CQ2
, c
, and d
are created.
The queue item from TR1
, CQ1
, starts to execute. The node a
cannot be created on the device. The node b
was created on the device but that change is reverted as a
failed to be created.
The reverse of TR1
, the rollback of CQ1
, TR3
, is committed.
TR3
; service s1
is applied with the old parameters. Thus the effect of TR1
is reverted. Nothing needs to be pushed towards the network, so no queue item is created.
TR2
; as the queue item from TR2
, CQ2
, is not the same service instance and has no overlapping data on the ce1
device, this queue item executes as normal.
NSO1
:TR1
; service s1
dispatches the service to NSO2
and NSO3
through the queue item NSO1
:CQ1
. In the changes of NSO1
:CQ1
, NSO2:s1
and NSO3:s1
are created.
NSO1
:TR2
; service s2
dispatches the service to NSO2
through the queue item NSO1
:CQ2
. In the changes of NSO1
:CQ2
, NSO2:s2
is created.
The queue item from NSO2
:TR1
, NSO2
:CQ1
, starts to execute. The node a
cannot be created on the device. The node b
was created on the device, but that change is reverted as a
failed to be created.
The queue item from NSO3
:TR1
, NSO3
:CQ1
, starts to execute. The changes in the queue item are committed successfully to the network.
The reverse of TR1
, rollback of CQ1
, TR3
, is committed on all nodes part of TR1
that failed.
NSO2
:TR3
; service s1
is applied with the old parameters. Thus the effect of NSO2
:TR1
is reverted. Nothing needs to be pushed towards the network, so no queue item is created.
NSO1
:TR3
; service s1
is applied with the old parameters. Thus the effect of NSO1
:TR1
is reverted. A queue item is created to push the transaction changes to the lower nodes that didn't fail.
NSO3
:TR3
; service s1
is applied with the old parameters. Thus the effect of NSO3
:TR1
is reverted. Since the changes in the queue item NSO3
:CQ1
was successfully committed to the network a new queue item NSO3
:CQ3
is created to revert those changes.
If for some reason the rollback transaction fails there are, depending on the failure, different techniques to reconcile the services involved:
Make sure that the commit queue is blocked to not interfere with the error recovery procedure. Do a sync-from on the non-completed device(s) and then re-deploy the failed service(s) with the reconcile
option to reconcile original data, i.e., take control of that data. This option acknowledges other services controlling the same data. The reference count will indicate how many services control the data. Release any queue lock that was created.
Make sure that the commit queue is blocked to not interfere with the error recovery procedure. Use un-deploy with the no-networking option on the service and then do sync-from on the non-completed device(s). Make sure the error is fixed and then re-deploy the failed service(s) with the reconcile
option. Release any queue lock that was created.
As the goal of the commit queue is to increase the transactional throughput of NSO it means that we need to calculate the configuration change towards the device(s) outside of the transaction lock. To calculate a configuration change NSO needs a pre-commit running and a running view of the database. The key enabler to support this in the commit queue is to allow different views of the database to live beyond the commit. In NSO, this is implemented by keeping a snapshot database of the configuration tree for devices and storing configuration changes towards this snapshot database on a per-device basis. The snapshot database is updated when a device in the queue has been processed. This snapshot database is stored on disk for persistence (the S.cdb
file in the ncs-cdb
directory).
The snapshot database could be populated in two ways. This is controlled by the /ncs-config/cdb/snapshot/pre-populate
setting in the ncs.conf
file. The parameter controls whether the snapshot database should be pre-populated during the upgrade or not. Switching this on or off implies different trade-offs.
If set to false
, NSO is optimized for the default transaction behavior. The snapshot database is populated in a lazy manner (when a device is committed through the commit queue for the first time after an upgrade). The drawback is that this commit will suffer performance-wise, which is especially true for devices with large configurations. Subsequent commits on the same device will not have the same penalty.
If true
, NSO is optimized for systems using the commit queue extensively. This will lead to better performance when committing using the commit queue with no additional penalty for first-time commits. The drawbacks are that the time to do upgrades will increase and also an almost twofold increase in NSO memory consumption.
The NSO device manager has built-in support for the NETCONF Call Home client protocol operations over SSH as defined in RFC 8071.
With NETCONF SSH Call Home, the NETCONF client listens for TCP connection requests from NETCONF servers. The SSH client protocol is started when the connection is accepted. The SSH client validates the server's presented host key with credentials stored in NSO. If no matching host key is found the TCP connection is closed immediately. Otherwise, the SSH connection is established, and NSO is enabled to communicate with the device. The SSH connection is kept open until the device itself terminates the connection, an NSO user disconnects the device, or the idle connection timeout is triggered (configurable in the ncs.conf
file).
NSO will generate an asynchronous notification event whenever there is a connection request. An application can subscribe to these events and, for example, add an unknown device to the device tree with the information provided, or invoke actions on the device if it is known.
If an SSH connection is established, any outstanding configuration in the commit queue for the device will be pushed. Any notification stream for the device will also be reconnected.
NETCONF Call Home is enabled and configured under /ncs-config/netconf-call-home
in the ncs.conf
file. By default NETCONF Call Home is disabled.
A device can be connected through the NETCONF Call Home client only if /devices/device/state/admin-state
is set to call-home
. This state prevents any southbound communication to the device unless the connection has already been established through the NETCONF Call Home client protocol.
The NSO device manager has built-in support for device notifications. Notifications are a means for the managed devices to send structured data asynchronously to the manager. NSO has native support for NETCONF event notifications (see RFC 5277) but could also receive notifications from other protocols implemented by the Network Element Drivers.
Notifications can be utilized in various use-case scenarios. It can be used to populate alarms in the Alarm manager, collect certain types of errors over time, build a network-wide audit log, react to configuration changes, etc.
The basic mode of operation is the manager subscribes to one or more named notification channels which are announced by the managed device. The manager keeps an open SSH channel towards the managed device, and then, the managed device may asynchronously send structured XML data on the SSH channel.
The notification support in NSO is usable as is without any further programming. However, NSO cannot understand any semantics contained inside the received XML messages, thus for example a notification with a content of "Clear Alarm 456" cannot be processed by NSO without any additional programming.
When you add programs to interpret and act upon notifications, make sure that resulting operations are idempotent. This means that they should be able to be called any number of times while guaranteeing that side effects only occur once. The reason for this is that, for example, replaying notifications can sometimes mean that your program will handle the same notifications multiple times.
In the tailf-ncs.yang
data model, you find a YANG data model that can be used to:
Setup subscriptions. A subscription is configuration data from the point of view of NSO, thus if NSO is restarted, all configured subscriptions are automatically resumed.
Inspect which named streams a managed device publishes.
View all received notifications.
Notifications must be defined at the top level of a YANG module. NSO does currently not support defining notifications inside lists or containers as specified in section 7.16 in RFC 7950.
In this section, we will use the examples.ncs/web-server-farm/basic
example.
Let's dive into an example session with the NSO CLI. In the NSO example collection, the webserver publishes two NETCONF notification structures, indicating what they intend to send to any interested listeners. They all have the YANG module:
Follow the instructions in the README file if you want to run the example: build the example, start netsim, and start NCS.
The above shows how we can inspect - as status data - which named streams the managed device publishes. Each stream also has some associated data. The data model for that looks like this:
Let's set up a subscription for the stream called interface
. The subscriptions are NSO configuration data, thus to create a subscription we need to enter configuration mode:
The above example created subscriptions for the interface
stream on all web servers, i.e. managed devices, www0
, www1
, and www2
. Each subscription must have an associated stream to it, this is however not the key for an NSO notification, the key is a free-form text string. This is because we can have multiple subscriptions to the same stream. More on this later when we describe the filter that can be associated with a subscription. Once the notifications start to arrive, they are read by NSO and stored in stable storage as CDB operational data. they are stored under each managed device - and we can view them as:
Each received notification has some associated metadata, such as the time the event was received by NSO, which subscription and which stream is associated with the notification, and also which user created the subscription.
It is fairly instructive to inspect the XML that goes on the wire when we create a subscription and then also receive the first notification. We can do:
Thus, once the subscription has been configured, NSO continuously receives, and stores in CDB oper persistent storage, the notifications sent from the managed device. The notifications are stored in a circular buffer, to set the size of the buffer, we can do:
The default value is 200. Once the size of the circular buffer is exceeded, the older notification is removed.
A running subscription can be in either of three states. The YANG model has:
If a subscription is in the failed state, an optional failure-reason field indicates the reason for the failure. If a subscription fails due to, not being able to connect to the managed device or if the managed device closed its end of the SSH socket, NSO will attempt to automatically reconnect. The re-connect attempt interval is configurable.
SNMP Notifications (v1, v2c, v3) can be received by NSO and acted upon. The SNMP receiver is a stand-alone process and by default, all notifications are ignored. IP addresses must be opted in and a handler must be defined to take actions on certain notifications. This can be used to for example listen to configuration change notifications and trigger a log action or a resync for example
These actions are programmed in Java, see the SNMP Notification Receiver for how to do this.
NSO can configure inactive parameters on the devices that support inactive configuration. Currently, these devices include Juniper devices and devices that announce http://tail-f.com/ns/netconf/inactive/1.0
capability. NSO itself implements http://tail-f.com/ns/netconf/inactive/1.0
capability which is formally defined in tailf-netconf-inactive
YANG module.
To recap, a node that is marked as inactive exists in the data store but is not used by the server. The nodes announced as inactive by the device will also be inactive in the device's configuration in NSO, and activating/deactivating a node in NSO will push the corresponding change to the device. This also means that for NSO to be able to manage inactive configuration, both /ncs-config/enable-inactive
and /ncs-config/netconf-north-bound/capabilities/inactive
need to be enabled in ncs.conf
.
If the inactive feature is disabled in ncs.conf
, NSO will still be able to manage devices that have inactive configuration in their datastore, but the inactive attribute will be ignored, so the data will appear as active in NSO and it would not be possible for NSO to activate/deactivate such nodes in the device.
Manage NSO alarms with native alarm manager.
NSO embeds a generic alarm manager. It manages NSO native alarms and can easily be extended with application-specific alarms. Alarm sources can be notifications from devices, undesired states on services detected or anything provided via the Java API.
The Alarm Manager has three main components:
Alarm List: A list of alarms in NSO. Each list entry represents an alarm state for a specific device, an object within the device, and an alarm type.
Alarm Model: For each alarm type, you can configure the mapping to for example X.733 alarm standard parameters that are sent as notifications northbound.
Operator Actions: Actions to set operator states on alarms such as acknowledgement, and also actions to administratively manage the alarm list such as deleting alarms.
The alarm manager is accessible over all northbound interfaces. A read-only view including an SNMP alarm table and alarm notifications is available in an SNMP Alarm MIB. This MIB is suitable for integration with SNMP-based alarm systems.
To populate the alarm list there is a dedicated Java API. This API lets a developer add alarms, change states on alarms, etc. A common usage pattern is to use the SNMP notification receiver to map a subset of the device traps into alarms.
First of all, it is important to clearly define what an alarm means: "An alarm denotes an undesirable state in a resource for which an operator action is required". Alarms are often confused with general logging and event mechanisms, thereby overflooding the operator with alarms. In NSO, the alarm manager shows undesired resource states that an operator should investigate. NSO contains other mechanisms for logging in general. Therefore, NSO does not naively populate the alarm list with traps received in the SNMP notification receiver.
Before looking into how NSO handles alarms, it is important to define the fundamental concepts. We make a clear distinction between alarms and events in general. Alarms should be taken seriously and be investigated. Alarms have states; they go active with a specific severity, they change severity, and they are cleared by the resource. The same alarm may become active again. A common mistake is to confuse the operator view with the resource view. The model described so far is the resource view. The resource itself may consider the alarm cleared. The alarm manager does not automatically delete cleared alarms. An alarm that has existed in the network may still need investigation. There are dedicated actions an operator can use to manage the alarm list, for example, delete the alarms based on criteria such as cleared and date. These actions can be performed over all northbound interfaces.
Rather than viewing alarms as a list of alarm notifications, NSO defines alarms as states on objects. The NSO alarm list uses four keys for alarms: the alarming object within a device, the alarm type, and an optional specific problem.
Alarm types are normally unique identifiers for a specific alarm state and are defined statically. An alarm type corresponds to the well-known X.733 alarm standard tuple event type and probable cause. A specific problem is an optional key that is string-based and can further redefine an alarm type at run-time. This is needed for alarms that are not known before a system is deployed.
Imagine a system with general digital inputs. A MIB might specify traps called input-high
, or input-low
. When defining the SNMP notification reception, an integrator might define an alarm type called "External-Alarm". input-high
might imply a major alarm and input-low
might imply clear.
At installation, some detectors report "fire-alarm" and some "door-open" alarms. This is configured at the device and sent as free text in the SNMP var-binds. This is then managed by using the specific problem field of the NSO alarm manager to separate these different alarm types.
The data model for the alarm manager is outlined below.
This means that we have a list with key: (managed device, managed object, alarm type, specific problem). In the example above, we might have the following different alarms:
Device : House1; Managed Object : Detector1; Alarm-Type : External Alarm; Specific Problem = Smoke;
Device : House1; Managed Object : Detector2; Alarm-Type : External Alarm; Specific Problem = Door Open;
Each alarm entry shows the last status change for the alarm and also a child list with all status changes sorted in chronological order.
is-cleared
: was the last state change clear?
last-status-change
: timestamp for the last status change.
last-perceived-severity
: last severity (not equal to clear).
last-alarm-text
: the last alarm text (not equal to clear).
status-change
, event-time
: the time reported by the device.
status-change
, received-time
: the time the state change was received by NSO.
status-change
, perceived-severity
: the new perceived severity.
status-change
, alarm-text
: descriptive text associated with the new alarm status.
It is fundamental to define alarm types (specific problem) and the managed objects with a fine-grained mechanism that still is extensible. For objects we allow YANG instance-identifiers to refer to a YANG instance identifier, an SNMP OID, or a string. Strings can be used when the underlying object is not modeled. We use YANG identities to define alarm types. This has the benefit that alarm types can be defined in a named hierarchy and thereby provide an extensible mechanism. To support "dynamic alarm types" so that alarms can be separated by information only available at run-time, the string-based field-specific problem can also be used.
So far we have described the model based on the resource view. It is common practice to let operators manipulate the alarms corresponding to the operator's investigation. We clearly separate the resource and the operator view, for example, there is no such thing as an operator "clearing an alarm". Rather the alarm entries can have a corresponding alarm handling state. Operators may want to acknowledge an alarm and set the alarm state to closed or similar.
We also support some alarm list administrative actions:
Synchronize alarms: try to read the alarm states in the underlying resources and update the alarm list accordingly (this action needs to be implemented by user code for specific applications).
Purge alarms: delete entries in the alarm list based on several different filter criteria.
Filter alarms: with an XPATH as filter input, this action returns all alarms fulfilling the filter.
Compress alarms: since every entry may contain a large amount of state change entries this action compresses the history to the latest state change.
Alarms can be forwarded over NSO northbound interfaces. In many telecom environments, alarms need to be mapped to X.733 parameters. We provide an alarm model where every alarm type is mapped to the corresponding X.733 parameters such as event type and probable cause. In this way, it is easy to integrate NSO alarms into whatever X.733 enumerated values the upper fault management system requires.
The central part of the YANG Alarm model tailf-ncs-alarms.yang
has the following structure.
The first part of the YANG listing above shows the definition for managed-object
type in order for alarms to refer to YANG, SNMP, and other resources. We also see basic definitions from the X.733 standard for severity levels.
Note well the definition of alarm type using YANG identities. In this way, we can create a structured alarm-type hierarchy all rooted at alarm-type
. For you to add your specific alarm types, define your own alarm types YANG file and add identities using alarm-type
as a base.
The alarm-model
container contains the mapping from alarm types to X.733 parameters used for north-bound interfaces.
The alarm-list
container is the actual alarm list where we maintain a list mapping (device, managed-object, alarm-type, specific-problem) to the corresponding alarm state changes [(time, severity, text)].
Finally, we see the northbound alarm notification and alarm administrative actions.
The NSO alarm manager has support for the operator to acknowledge alarms. We call this alarm handling. Each alarm has an associated list of alarm handling entries as:
The following typedef defines the different states an alarm can be set into.
It is of course also possible to manipulate the alarm handling list from either Java code or Javascript code running in the web browser using the js_maapi
library.
Below is a simple scenario to illustrate the alarm concepts. The example can be found in examples.ncs/service-provider/simple-mpls-vpn
.
In the above scenario, we stop two of the devices and then ask NSO to connect to all devices. This results in two alarms for pe0
and pe1
. Note that the key for the alarm is the device name, the alarm type, the full path to the object (in this case, the device and not an object within the device), and finally an empty string for the specific problem.
In the next command sequence, we start the device and request NSO to connect. This will clear the alarms.
Note that there are two status-change entries for the alarm and that the alarm is cleared. In the following scenario, we will state that the alarm is closed and finally purge (delete) all alarms that are cleared and closed (Again, note the distinction between operator states and the states from the underlying resources).
Assume that you need to configure the northbound parameters. This is done using the alarm model. A logical mapping of the connection problem above is to map it to X.733 probable cause connectionEstablishmentError (22)
. This is done in the NSO CLI in the following way: